AITopics | Suo, Daniel

Collaborating Authors

Suo, Daniel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Provable Length Generalization in Sequence Prediction via Spectral Filtering

Marsden, Annie, Dogariu, Evan, Agarwal, Naman, Chen, Xinyi, Suo, Daniel, Hazan, Elad

arXiv.org Artificial IntelligenceNov-1-2024

Sequence prediction is a fundamental problem in machine learning with widespread applications in natural language processing, time-series forecasting, and control systems. In this setting, a learner observes a sequence of tokens and iteratively predicts the next token, suffering a loss that measures the discrepancy between the predicted and the true token. Predicting future elements of a sequence based on historical data is crucial for tasks ranging from language modeling to autonomous control. A key challenge in sequence prediction is understanding the role of context length--the number of previous tokens used to make the upcoming prediction--and designing predictors that perform well with limited context due to computational and memory constraints. These resource constraints become particularly significant during the training phase of a predictor, where the computational cost of using long sequences can be prohibitive. Consequently, it is beneficial to design predictors that can learn from a smaller context length while still generalizing well to longer sequences. This leads us to the central question of our investigation: Can we develop algorithms that learn effectively using short contexts but perform comparably to models that use longer contexts?

length generalization, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2411.01035

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

FutureFill: Fast Generation from Convolutional Sequence Models

Agarwal, Naman, Chen, Xinyi, Dogariu, Evan, Feinberg, Vlad, Suo, Daniel, Bartlett, Peter, Hazan, Elad

arXiv.org Artificial IntelligenceOct-25-2024

We address the challenge of efficient auto-regressive generation in sequence prediction models by introducing FutureFill - a method for fast generation that applies to any sequence prediction algorithm based on convolutional operators. Our approach reduces the generation time requirement from quadratic to quasilinear relative to the context length. Additionally, FutureFill requires a prefill cache sized only by the number of tokens generated, which is smaller than the cache requirements for standard convolutional and attention-based models. We validate our theoretical findings with experimental evidence demonstrating correctness and efficiency gains in a synthetic generation task.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.03766

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Spectral State Space Models

Agarwal, Naman, Suo, Daniel, Chen, Xinyi, Hazan, Elad

arXiv.org Artificial IntelligenceFeb-7-2024

This paper studies sequence modeling for prediction tasks with long range dependencies. We propose a new formulation for state space models (SSMs) based on learning linear dynamical systems with the spectral filtering algorithm (Hazan et al. (2017)). This gives rise to a novel sequence prediction architecture we call a spectral state space model. Spectral state space models have two primary advantages. First, they have provable robustness properties as their performance depends on neither the spectrum of the underlying dynamics nor the dimensionality of the problem. Second, these models are constructed with fixed convolutional filters that do not require learning while still outperforming SSMs in both theory and practice. The resulting models are evaluated on synthetic dynamical systems and long-range prediction tasks of various modalities. These evaluations support the theoretical benefits of spectral filtering for tasks requiring very long range memory.

artificial intelligence, machine learning, sequence, (17 more...)

arXiv.org Artificial Intelligence

2312.06837

Country: North America > United States (0.14)

Genre:

Research Report (0.70)
Overview (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Online Learning for Obstacle Avoidance

Snyder, David, Booker, Meghan, Simon, Nathaniel, Xia, Wenhan, Suo, Daniel, Hazan, Elad, Majumdar, Anirudha

arXiv.org Artificial IntelligenceNov-5-2023

We approach the fundamental problem of obstacle avoidance for robotic systems via the lens of online learning. In contrast to prior work that either assumes worst-case realizations of uncertainty in the environment or a stationary stochastic model of uncertainty, we propose a method that is efficient to implement and provably grants instance-optimality with respect to perturbations of trajectories generated from an open-loop planner (in the sense of minimizing worst-case regret). The resulting policy adapts online to realizations of uncertainty and provably compares well with the best obstacle avoidance policy in hindsight from a rich class of policies. The method is validated in simulation on a dynamical system environment and compared to baseline open-loop planning and robust Hamilton- Jacobi reachability techniques. Further, it is implemented on a hardware example where a quadruped robot traverses a dense obstacle field and encounters input disturbances due to time delays, model uncertainty, and dynamics nonlinearities.

artificial intelligence, machine learning, obstacle, (16 more...)

arXiv.org Artificial Intelligence

2306.08776

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry:

Energy (0.68)
Leisure & Entertainment > Games (0.67)
Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Benchmarking Neural Network Training Algorithms

Dahl, George E., Schneider, Frank, Nado, Zachary, Agarwal, Naman, Sastry, Chandramouli Shama, Hennig, Philipp, Medapati, Sourabh, Eschenhagen, Runa, Kasimbeg, Priya, Suo, Daniel, Bae, Juhan, Gilmer, Justin, Peirson, Abel L., Khan, Bilal, Anil, Rohan, Rabbat, Mike, Krishnan, Shankar, Snider, Daniel, Amid, Ehsan, Chen, Kongtao, Maddison, Chris J., Vasudev, Rakshith, Badura, Michal, Garg, Ankush, Mattson, Peter

arXiv.org Artificial IntelligenceJun-12-2023

Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a community, we are currently unable to reliably identify training algorithm improvements, or even determine the state-of-the-art training algorithm. In this work, using concrete experiments, we argue that real progress in speeding up training requires new benchmarks that resolve three basic challenges faced by empirical comparisons of training algorithms: (1) how to decide when training is complete and precisely measure training time, (2) how to handle the sensitivity of measurements to exact workload details, and (3) how to fairly compare algorithms that require hyperparameter tuning. In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark. Our benchmark includes a set of workload variants that make it possible to detect benchmark submissions that are more robust to workload changes than current widely-used methods. Finally, we evaluate baseline submissions constructed using various optimizers that represent current practice, as well as other optimizers that have recently received attention in the literature. These baseline results collectively demonstrate the feasibility of our benchmark, show that non-trivial gaps between methods exist, and set a provisional state-of-the-art for future benchmark submissions to try and surpass.

artificial intelligence, machine learning, training algorithm benchmark, (20 more...)

arXiv.org Artificial Intelligence

2306.07179

Country:

North America > Canada (0.28)
Europe > Germany > Baden-Württemberg (0.27)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Education (0.92)
Information Technology (0.92)
Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adaptive Gradient Methods at the Edge of Stability

Cohen, Jeremy M., Ghorbani, Behrooz, Krishnan, Shankar, Agarwal, Naman, Medapati, Sourabh, Badura, Michal, Suo, Daniel, Cardoze, David, Nado, Zachary, Dahl, George E., Gilmer, Justin

arXiv.org Artificial IntelligenceJul-29-2022

Very little is known about the training dynamics of adaptive gradient methods like Adam in deep learning. In this paper, we shed light on the behavior of these algorithms in the full-batch and sufficiently large batch settings. Specifically, we empirically demonstrate that during full-batch training, the maximum eigenvalue of the preconditioned Hessian typically equilibrates at a certain numerical value -- the stability threshold of a gradient descent algorithm. For Adam with step size $\eta$ and $\beta_1 = 0.9$, this stability threshold is $38/\eta$. Similar effects occur during minibatch training, especially as the batch size grows. Yet, even though adaptive methods train at the ``Adaptive Edge of Stability'' (AEoS), their behavior in this regime differs in a significant way from that of non-adaptive methods at the EoS. Whereas non-adaptive algorithms at the EoS are blocked from entering high-curvature regions of the loss landscape, adaptive gradient methods at the AEoS can keep advancing into high-curvature regions, while adapting the preconditioner to compensate. Our findings can serve as a foundation for the community's future understanding of adaptive gradient methods in deep learning.

artificial intelligence, machine learning, stability threshold, (15 more...)

arXiv.org Artificial Intelligence

2207.14484

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Continual Learning in Generative Adversarial Nets

Seff, Ari, Beatson, Alex, Suo, Daniel, Liu, Han

arXiv.org Machine LearningMay-23-2017

Developments in deep generative models have allowed for tractable learning of high-dimensional data distributions. While the employed learning procedures typically assume that training data is drawn i.i.d. from the distribution of interest, it may be desirable to model distinct distributions which are observed sequentially, such as when different classes are encountered over time. Although conditional variations of deep generative models permit multiple distributions to be modeled by a single network in a disentangled fashion, they are susceptible to catastrophic forgetting when the distributions are encountered sequentially. In this paper, we adapt recent work in reducing catastrophic forgetting to the task of training generative adversarial networks on a sequence of distinct distributions, enabling continual generative modeling.

deep learning, neural network, objective, (17 more...)

arXiv.org Machine Learning

1705.08395

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback