AITopics | Blalock, Davis

Collaborating Authors

Blalock, Davis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

$\mu$nit Scaling: Simple and Scalable FP8 LLM Training

Narayan, Saaketh, Gupta, Abhay, Paul, Mansheej, Blalock, Davis

arXiv.org Artificial IntelligenceFeb-9-2025

Large Language Model training with 8-bit floating point (FP8) formats promises significant efficiency improvements, but reduced numerical precision makes training challenging. It is currently possible to train in FP8 only if one is willing to tune various hyperparameters, reduce model scale, or accept the overhead of computing dynamic scale factors. We demonstrate simple, scalable FP8 training that requires no dynamic scaling factors or special hyperparameters, even at large model sizes. Our method, $\mu$nit Scaling ($\mu$S), also enables simple hyperparameter transfer across model widths, matched numerics across training and inference, and other desirable properties. $\mu$nit Scaling is straightforward to implement, consisting of a set of minimal interventions based on a first-principles analysis of common transformer operations. We validate our method by training models from 1B to 13B parameters, performing all hidden linear layer computations in FP8. We achieve quality equal to higher precision baselines while also training up to 33% faster.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2502.05967

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

Dynamic Masking Rate Schedules for MLM Pretraining

Ankner, Zachary, Saphra, Naomi, Blalock, Davis, Frankle, Jonathan, Leavitt, Matthew L.

arXiv.org Artificial IntelligenceSep-14-2023

Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%. We propose to instead dynamically schedule the masking rate throughout training. We find that linearly decreasing the masking rate over the course of pretraining improves average GLUE accuracy by up to 0.46% and 0.25% in BERT-base and BERT-large, respectively, compared to fixed rate baselines. These gains come from exposure to both high and low masking rate regimes, providing benefits from both settings. Our results demonstrate that masking rate scheduling is a simple way to improve the quality of masked language models, achieving up to a 1.89x speedup in pretraining for BERT-base as well as a Pareto improvement for BERT-large.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.15096

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities

Bartoldson, Brian R., Kailkhura, Bhavya, Blalock, Davis

arXiv.org Artificial IntelligenceMar-21-2023

Although deep learning has made great progress in recent years, the exploding economic and environmental costs of training neural networks are becoming unsustainable. To address this problem, there has been a great deal of research on *algorithmically-efficient deep learning*, which seeks to reduce training costs not at the hardware or implementation level, but through changes in the semantics of the training program. In this paper, we present a structured and comprehensive overview of the research in this field. First, we formalize the *algorithmic speedup* problem, then we use fundamental building blocks of algorithmically efficient training to develop a taxonomy. Our taxonomy highlights commonalities of seemingly disparate methods and reveals current research gaps. Next, we present evaluation best practices to enable comprehensive, fair, and reliable comparisons of speedup techniques. To further aid research and applications, we discuss common bottlenecks in the training pipeline (illustrated via experiments) and offer taxonomic mitigation strategies for them. Finally, we highlight some unsolved research challenges and present promising future directions.

artificial intelligence, machine learning, survey article, (17 more...)

arXiv.org Artificial Intelligence

2210.0664

Country: North America > United States (0.67)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Education (1.00)
Energy > Power Industry (0.67)
Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates

Portes, Jacob, Blalock, Davis, Stephenson, Cory, Frankle, Jonathan

arXiv.org Artificial IntelligenceNov-10-2022

Benchmarking the tradeoff between neural network accuracy and training time is computationally expensive. Here we show how a multiplicative cyclic learning rate schedule can be used to construct a tradeoff curve in a single training run. We generate cyclic tradeoff curves for combinations of training methods such as Blurpool, Channels Last, Label Smoothing and MixUp, and highlight how these cyclic tradeoff curves can be used to efficiently evaluate the effects of algorithmic choices on network training. In order to make meaningful improvements in neural network training efficiency, ML practitioners must be able to compare between different choices of network architectures, hyperparameters, and training algorithms. One straightforward way to do this is to characterize the tradeoff between accuracy and training time with a "tradeoff curve." Tradeoff curves can be generated by varying the length of training for each model configuration; longer training runs take more time but tend to reach higher quality (Figure 1C). For a fixed model and task configuration, this method of generating tradeoff curves is an estimate of the theoretical Pareto frontier, i.e. the set of all of the best possible tradeoffs between training time and accuracy, where any further attempt to improve one of these metrics worsens the other.

artificial intelligence, machine learning, tradeoff curve, (14 more...)

arXiv.org Artificial Intelligence

2206.00832

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Multiplying Matrices Without Multiplying

Blalock, Davis, Guttag, John

arXiv.org Machine LearningJun-21-2021

Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm for this task that greatly outperforms existing methods. Experiments using hundreds of matrices from diverse domains show that it often runs $100\times$ faster than exact matrix products and $10\times$ faster than current approximate methods. In the common case that one matrix is known ahead of time, our method also has the interesting property that it requires zero multiply-adds. These results suggest that a mixture of hashing, averaging, and byte shuffling$-$the core operations of our method$-$could be a more promising building block for machine learning than the sparsified, factorized, and/or scalar quantized matrix products that have recently been the focus of substantial research and hardware investment.

deep learning, matrix, neural network, (20 more...)

arXiv.org Machine Learning

2106.1086

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

What is the State of Neural Network Pruning?

Blalock, Davis, Ortiz, Jose Javier Gonzalez, Frankle, Jonathan, Guttag, John

arXiv.org Machine LearningMar-6-2020

Neural network pruning---the task of reducing the size of a network by removing parameters---has been the subject of a great deal of work in recent years. We provide a meta-analysis of the literature, including an overview of approaches to pruning and consistent findings in the literature. After aggregating results across 81 papers and pruning hundreds of models in controlled conditions, our clearest finding is that the community suffers from a lack of standardized benchmarks and metrics. This deficiency is substantial enough that it is hard to compare pruning techniques to one another or determine how much progress the field has made over the past three decades. To address this situation, we identify issues with current practices, suggest concrete remedies, and introduce ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods. We use ShrinkBench to compare various pruning techniques and show that its comprehensive evaluation can prevent common pitfalls when comparing pruning methods.

deep learning, neural network, pruning, (19 more...)

arXiv.org Machine Learning

2003.03033

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Multiple Instance Learning for ECG Risk Stratification

Shanmugam, Divya, Blalock, Davis, Gong, Jen G., Guttag, John

arXiv.org Machine LearningDec-2-2018

In this paper, we apply a multiple instance learning paradigm to signal-based risk stratification for cardiovascular outcomes. In contrast to methods that require handcrafted features or domain knowledge, our method learns a representation with state-of-the-art predictive power from the raw ECG signal. We accomplish this by leveraging the multiple instance learning framework. This framework is particularly valuable to learning from biometric signals, where patient-level labels are available but signal segments are rarely annotated. We make two contributions in this paper: 1) reframing risk stratification for cardiovascular death (CVD) as a multiple instance learning problem, and 2) using this framework to design a new risk score, for which patients in the highest quartile are 15.9 times more likely to die of CVD within 90 days of hospital admission for an acute coronary syndrome.

neural network, risk stratification, vascular disease, (18 more...)

arXiv.org Machine Learning

1812.00475

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.16)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback