AITopics | badam

Collaborating Authors

badam

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

2c570b0f9938c7a58a612e5b00af9cc0-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 06:25:22 GMT

arxiv preprint arxiv, badam, experiment, (11 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models

Neural Information Processing SystemsDec-24-2025, 16:22:20 GMT

This work presents BAdam, an optimization method that leverages the block coordinate descent (BCD) framework with Adam's update rule. BAdam offers a memory efficient approach to the full parameter finetuning of large language models. We conduct a theoretical convergence analysis for BAdam in the deterministic case. Experimentally, we apply BAdam to finetune the Llama 3-8B and Llama 3-70B models using a single RTX3090-24GB GPU and 4 A100-80GB GPUs, respectively. The results confirm BAdam's efficiency in terms of memory usage, running time, and optimization capability. Furthermore, the downstream performance evaluation based on MT-bench and math benchmarks shows that BAdam outperforms existing memory efficient baselines such as LoRA. It also demonstrates that BAdam can achieve comparable or even superior performance compared to Adam. Finally, the ablation study using SGD's update rule illustrates the suitability of BCD for finetuning LLMs. Our code can be easily integrated into any PyTorch-based codebase and is available at https://github.com/Ledzy/BAdam.

large language model, machine learning, natural language, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

2c570b0f9938c7a58a612e5b00af9cc0-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 21:58:09 GMT

arxiv preprint arxiv, badam, experiment, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)

Add feedback

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models

Neural Information Processing SystemsMay-26-2025, 19:57:26 GMT

large language model, machine learning, natural language, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

Zmushko, Philip, Beznosikov, Aleksandr, Takáč, Martin, Horváth, Samuel

arXiv.org Artificial IntelligenceNov-12-2024

With the increase in the number of parameters in large language models, the process of pre-training and fine-tuning increasingly demands larger volumes of GPU memory. A significant portion of this memory is typically consumed by the optimizer state. To overcome this challenge, recent approaches such as low-rank adaptation (LoRA (Hu et al., 2021)), low-rank gradient projection (GaLore (Zhao et al., 2024a)), and blockwise optimization (BAdam (Luo et al., 2024)) have been proposed. However, in all these algorithms, the effective rank of the weight updates remains low-rank, which can lead to a substantial loss of information from the gradient. This loss can be critically important, especially during the pre-training stage. In this paper, we introduce FRUGAL (Full-Rank Updates with GrAdient spLitting), a new memory-efficient optimization framework. FRUGAL leverages gradient splitting to perform low-dimensional updates using advanced algorithms (such as Adam), while updates along the remaining directions are executed via statefree methods like SGD or signSGD (Bernstein et al., 2018). Our framework can be integrated with various low-rank update selection techniques, including GaLore and BAdam. We provide theoretical convergence guarantees for our framework when using SGDM for low-dimensional updates and SGD for state-free updates. Additionally, our method consistently outperforms concurrent approaches across various fixed memory budgets, achieving state-of-the-art results in pre-training and fine-tuning tasks while balancing memory efficiency and performance metrics. In recent years, Large Language Models (LLMs) such as GPT (OpenAI, 2023) and LLaMA-3 Dubey et al. (2024) have demonstrated remarkable performance across various disciplines (Brown, 2020; Yang et al., 2024; Romera-Paredes et al., 2024). However, a critical factor in achieving these results is the size of these models (Hoffmann et al., 2022). A larger number of parameters not only increases computational cost but also significantly raises memory requirements. For instance, training an 8 billion parameter LLaMA model in a 16-bit format necessitates each parameter to occupy 2 bytes, resulting in 16GB for storing the parameters and an additional 16GB for gradients.

arxiv preprint arxiv, frugal, projection, (13 more...)

arXiv.org Artificial Intelligence

2411.07837

Country:

Asia > Middle East > Jordan (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models

Luo, Qijun, Yu, Hengxu, Li, Xiao

arXiv.org Artificial IntelligenceMay-22-2024

This work presents BAdam, an optimization method that leverages the block coordinate descent framework with Adam as the inner solver. BAdam offers a memory efficient approach to the full parameter finetuning of large language models. We conduct theoretical convergence analysis for BAdam in the deterministic case. Experimentally, we apply BAdam to instruction-tune the Llama 2-7B and Llama 3-8B models using a single RTX3090-24GB GPU. The results confirm BAdam's efficiency in terms of memory and running time. Additionally, the convergence verification indicates that BAdam exhibits superior convergence behavior compared to LoRA. Furthermore, the downstream performance evaluation using the MT-bench shows that BAdam modestly surpasses LoRA and more substantially outperforms LOMO. Finally, we compare BAdam with Adam on a medium-sized task, i.e., finetuning RoBERTa-large on the SuperGLUE benchmark. The results demonstrate that BAdam is capable of narrowing the performance gap with Adam more effectively than LoRA. Our code is available at https://github.com/Ledzy/BAdam.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2404.02827

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Hong Kong (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

Foster, Jack, Brintrup, Alexandra

arXiv.org Artificial IntelligenceSep-15-2023

The pursuit of long-term autonomy mandates that robotic agents must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing for robotic applications as they are space efficient and typically do not increase in computational complexity as the number of tasks grows. Despite these desirable properties, prior-based approaches typically fail on important benchmarks and consequently are limited in their potential applications compared to their memory-based counterparts. We introduce Bayesian adaptive moment regularization (BAdam), a novel prior-based method that better constrains parameter growth, leading to lower catastrophic forgetting. Our method boasts a range of desirable properties for robotic applications such as being lightweight and task label-free, converging quickly, and offering calibrated uncertainty that is important for safe real-world deployment. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments such as Split MNIST and Split FashionMNIST, and does so without relying on task labels or discrete task boundaries.

badam, continual learning, learning, (13 more...)

arXiv.org Artificial Intelligence

2309.08546

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Practical Bayesian Learning of Neural Networks via Adaptive Subgradient Methods

Salas, Arnold, Zohren, Stefan, Roberts, Stephen

arXiv.org Machine LearningNov-8-2018

We introduce a novel framework for the estimation of the posterior distribution of the weights of a neural network, based on a new probabilistic interpretation of adaptive subgradient algorithms such as AdaGrad and Adam. Having a confidence measure of the weights allows several shortcomings of neural networks to be addressed. In particular, the robustness of the network can be improved by performing weight pruning based on signal-to-noise ratios from the weight posterior distribution. Using the MNIST dataset, we demonstrate that the empirical performance of Badam, a particular instance of our framework based on Adam, is competitive in comparison to related Bayesian approaches such as Bayes By Backprop.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1811.03679

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback