AITopics | Hall, David

Collaborating Authors

Hall, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning

Tong, Anh, Nguyen-Tang, Thanh, Lee, Dongeun, Nguyen, Duc, Tran, Toan, Hall, David, Kang, Cheongwoong, Choi, Jaesik

arXiv.org Artificial IntelligenceMar-3-2025

Recent advancements in large language models (LLMs) based on transformer architectures have sparked significant interest in understanding their inner workings. In this paper, we introduce a novel approach to modeling transformer architectures using highly flexible non-autonomous neural ordinary differential equations (ODEs). Our proposed model parameterizes all weights of attention and feed-forward blocks through neural networks, expressing these weights as functions of a continuous layer index. Through spectral analysis of the model's dynamics, we uncover an increase in eigenvalue magnitude that challenges the weight-sharing assumption prevalent in existing theoretical studies. We also leverage the Lyapunov exponent to examine token-level sensitivity, enhancing model interpretability. Our neural ODE transformer demonstrates performance comparable to or better than vanilla transformers across various configurations and datasets, while offering flexible fine-tuning capabilities that can adapt to different architectural constraints.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.01329

Country:

North America > United States (0.67)
Asia (0.46)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective

Wen, Kaiyue, Li, Zhiyuan, Wang, Jason, Hall, David, Liang, Percy, Ma, Tengyu

arXiv.org Machine LearningDec-2-2024

Training language models currently requires pre-determining a fixed compute budget because the typical cosine learning rate schedule depends on the total number of steps. In contrast, the Warmup-Stable-Decay (WSD) schedule uses a constant learning rate to produce a main branch of iterates that can in principle continue indefinitely without a pre-specified compute budget. Then, given any compute budget, one can branch out from the main branch at a proper time with a rapidly decaying learning rate to produce a strong model. Empirically, WSD generates a non-traditional loss curve: the loss remains elevated during the stable phase but sharply declines during the decay phase. Towards explaining this phenomenon, we conjecture that pretraining loss exhibits a river valley landscape, which resembles a deep valley with a river at its bottom. Under this assumption, we show that during the stable phase, the iterate undergoes large oscillations due to the high learning rate, yet it progresses swiftly along the river. During the decay phase, the rapidly dropping learning rate minimizes the iterate's oscillations, moving it closer to the river and revealing true optimization progress. Therefore, the sustained high learning rate phase and fast decaying phase are responsible for progress in the river and the mountain directions respectively, and are both critical. Our analysis predicts phenomenons consistent with empirical observations and shows that this landscape can emerge from pretraining on a simple bi-gram dataset. Inspired by the theory, we introduce WSD-S, a variant of WSD that reuses previous checkpoints' decay phases and keeps only one main branch, where we resume from a decayed checkpoint. WSD-S empirically outperforms WSD and Cyclic-Cosine in obtaining multiple language model checkpoints across various compute budgets in a single run for parameters scaling from 0.1B to 1.2B.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2410.05192

Country: Asia > Middle East (0.14)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)

Add feedback

Object Registration in Neural Fields

Hall, David, Hausler, Stephen, Mahendren, Sutharsan, Moghadam, Peyman

arXiv.org Artificial IntelligenceMay-3-2024

Abstract-- Neural fields provide a continuous scene representation of 3D geometry and appearance in a way which has great promise for robotics applications. One functionality that unlocks unique use-cases for neural fields in robotics is object 6-DoF registration. In this paper, we provide an expanded analysis of the recent Reg-NF neural field registration method and its use-cases within a robotics context. We showcase the scenario of determining the 6-DoF pose of known objects within a scene using scene and object neural field models. We show how this may be used to better represent objects within imperfectly modelled scenes and generate new scenes by substituting object neural field models into the scene.

artificial intelligence, reg-nf, registration, (13 more...)

arXiv.org Artificial Intelligence

2404.18381

Country: Oceania > Australia > Queensland (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.92)

Add feedback

BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

Bolton, Elliot, Venigalla, Abhinav, Yasunaga, Michihiro, Hall, David, Xiong, Betty, Lee, Tony, Daneshjou, Roxana, Frankle, Jonathan, Liang, Percy, Carbin, Michael, Manning, Christopher D.

arXiv.org Artificial IntelligenceMar-27-2024

Large language models such as OpenAI's GPT-4 have become the dominant technology in modern natural language processing (Liu et al., 2023; Zhao et al., 2023). Trained on large corpora to predict the next token and refined with human feedback (Brown et al., 2020; Ouyang et al., 2022; Ziegler et al., 2020), these models develop impressive capabilities in areas such as summarization and questionanswering (Zhang et al., 2023; Goyal et al., 2023; Karpukhin et al., 2020). While the focus has been on these models' performance when responding to general English prompts, it is clear there is potential for specialist models to impact biomedical research and healthcare (Arora and Arora, 2023; Shah et al., 2023; Thirunavukarasu et al., 2023). Such applications include information retrieval and summarization from the ever-expanding biomedical literature (Wang et al., 2021; Yang, 2020), clinical information such as physician notes in electronic health records, and radiology reports (Murray et al., 2021; Feblowitz et al., 2011; Zhang et al., 2018). Improving domain-specific language models will help accelerate biomedical discovery, drive down healthcare costs, and improve patient care. Large, general models like GPT-4 and Med-PaLM 2 have set new standards for performance on question-answering and information extraction (Kung et al., 2022; Singhal et al., 2023a,b), but there are several drawbacks to these models. They are costly to train and utilize. Compute for training and inference of large language models have increased 10-to 100-fold since 2015 (Sevilla et al., 2022), translating to extremely high financial and

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.18421

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Reg-NF: Efficient Registration of Implicit Surfaces within Neural Fields

Hausler, Stephen, Hall, David, Mahendren, Sutharsan, Moghadam, Peyman

arXiv.org Artificial IntelligenceFeb-15-2024

Neural fields, coordinate-based neural networks, have recently gained popularity for implicitly representing a scene. In contrast to classical methods that are based on explicit representations such as point clouds, neural fields provide a continuous scene representation able to represent 3D geometry and appearance in a way which is compact and ideal for robotics applications. However, limited prior methods have investigated registering multiple neural fields by directly utilising these continuous implicit representations. In this paper, we present Reg-NF, a neural fields-based registration that optimises for the relative 6-DoF transformation between two arbitrary neural fields, even if those two fields have different scale factors. Key components of Reg-NF include a bidirectional registration loss, multi-view surface sampling, and utilisation of volumetric signed distance functions (SDFs). We showcase our approach on a new neural field dataset for evaluating registration problems. We provide an exhaustive set of experiments and ablation studies to identify the performance of our approach, while also discussing limitations to provide future direction to the research community on open challenges in utilizing neural fields in unconstrained environments.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2402.09722

Country: Oceania > Australia > Queensland (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Liu, Hong, Li, Zhiyuan, Hall, David, Liang, Percy, Ma, Tengyu

arXiv.org Artificial IntelligenceOct-17-2023

Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the pre-conditioner. The update is the moving average of the gradients divided by the moving average of the estimated Hessian, followed by element-wise clipping. The clipping controls the worst-case update size and tames the negative impact of non-convexity and rapid change of Hessian along the trajectory. Sophia only estimates the diagonal Hessian every handful of iterations, which has negligible average per-step time and memory overhead. On language modeling with GPT models of sizes ranging from 125M to 1.5B, Sophia achieves a 2x speed-up compared to Adam in the number of steps, total compute, and wall-clock time, achieving the same perplexity with 50% fewer steps, less total compute, and reduced wall-clock time. Theoretically, we show that Sophia, in a much simplified setting, adapts to the heterogeneous curvatures in different parameter dimensions, and thus has a run-time bound that does not depend on the condition number of the loss.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.14342

Country: Europe (0.28)

Genre: Research Report (0.63)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
(2 more...)

Add feedback

Anticipatory Music Transformer

Thickstun, John, Hall, David, Donahue, Chris, Liang, Percy

arXiv.org Artificial IntelligenceJun-14-2023

We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.0862

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

Khattab, Omar, Santhanam, Keshav, Li, Xiang Lisa, Hall, David, Liang, Percy, Potts, Christopher, Zaharia, Matei

arXiv.org Artificial IntelligenceJan-23-2023

Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple "retrieve-then-read" pipelines in which the RM retrieves passages that are inserted into the LM prompt. To begin to fully realize the potential of frozen LMs and RMs, we propose Demonstrate-Search-Predict (DSP), a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM. DSP can express high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions, systematically breaking down problems into small transformations that the LM and RM can handle more reliably. We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings, establishing in early evaluations new state-of-the-art in-context learning results and delivering 37-120%, 8-39%, and 80-290% relative gains against the vanilla LM (GPT-3.5), a standard retrieve-then-read pipeline, and a contemporaneous self-ask pipeline, respectively. We release DSP at https://github.com/stanfordnlp/dsp

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.14024

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Optimal Graph Search with Iterated Graph Cuts

Burkett, David (University of California, Berkeley) | Hall, David (University of California, Berkele) | Klein, Dan (University of California, Berkele)

AAAI ConferencesAug-4-2011

Informed search algorithms such as A* use heuristics to focus exploration on states with low total path cost. To the extent that heuristics underestimate forward costs, a wider cost radius of suboptimal states will be explored. For many weighted graphs, however, a small distance in terms of cost may encompass a large fraction of the unweighted graph. We present a new informed search algorithm, Iterative Monotonically Bounded A* (IMBA*), which first proves that no optimal paths exist in a bounded cut of the graph before considering larger cuts. We prove that IMBA* has the same optimality and completeness guarantees as A* and, in a non-uniform pathfinding application, we empirically demonstrate substantial speed improvements over classic A*.

algorithm, artificial intelligence, graph, (14 more...)

AAAI Conferences

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country: North America > United States > California (0.14)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback