AITopics | Fallen, Nova

Collaborating Authors

Fallen, Nova

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

Charles, Zachary, Teston, Gabriel, Dery, Lucio, Rush, Keith, Fallen, Nova, Garrett, Zachary, Szlam, Arthur, Douillard, Arthur

arXiv.org Artificial IntelligenceMar-12-2025

As we scale to more massive machine learning models, the frequent synchronization demands inherent in data-parallel approaches create significant slowdowns, posing a critical challenge to further scaling. Recent work develops an approach (DiLoCo) that relaxes synchronization demands without compromising model quality. However, these works do not carefully analyze how DiLoCo's behavior changes with model size. In this work, we study the scaling law behavior of DiLoCo when training LLMs under a fixed compute budget. We focus on how algorithmic factors, including number of model replicas, hyperparameters, and token budget affect training in ways that can be accurately predicted via scaling laws. We find that DiLoCo scales both predictably and robustly with model size. When well-tuned, DiLoCo scales better than data-parallel training with model size, and can outperform data-parallel training even at small model sizes. Our results showcase a more general set of benefits of DiLoCo than previously documented, including increased optimal batch sizes, improved downstream generalization with scale, and improved evaluation loss for a fixed token budget.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.09799

Country: Asia > Middle East (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Confidential Federated Computations

Eichner, Hubert, Ramage, Daniel, Bonawitz, Kallista, Huba, Dzmitry, Santoro, Tiziano, McLarnon, Brett, Van Overveldt, Timon, Fallen, Nova, Kairouz, Peter, Cheu, Albert, Daly, Katharine, Gascon, Adria, Gruteser, Marco, McMahan, Brendan

arXiv.org Artificial IntelligenceApr-16-2024

Since its introduction in 2017 [48, 42], federated learning (FL) has seen adoption by technology platforms working with private on-device data (cross-device federated learning) or proprietary server-side data (crosssilo federated learning). FL's appeal has been driven by its straightforward privacy advantages: raw data stays in the control of participating entities, with only focused updates sent for immediate aggregation, visible to the service provider. Systems that realize federated learning [18, 35, 51] run at scale today, reducing privacy risks in sensitive applications like mobile keyboards [33, 63, 21, 53] and voice assistants [12, 34]. However, basic federated learning offers an incomplete privacy story [19]: updates sent to the service provider can reveal private data unless updates are aggregated obliviously, and aggregated updates can encode individual data unless trained with a differentially private (DP) learning algorithm [30]. A dishonest service provider might log or inspect unaggregated messages, from which a great deal of information about an individual participant can be learned [15, 57]. This risk has been addressed with oblivious aggregation schemes that guarantee the service provider cannot inspect unaggregated messages, including secure multiparty computation (SMPC) from cohorts of honest devices [17], non-colluding SMPC-based secure aggregators [58], or hardware trusted execution environments (TEEs) [35].

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2404.10764

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Commercial Services & Supplies (0.94)
Information Technology > Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback