AITopics

Genre: Research Report > New Finding (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Neural Information Processing SystemsFeb-9-2026, 14:06:50 GMT

WhenDoFlatMinimaOptimizers Work?

Theoretical and empirical studies [21,77,9,55,49,5,12]postulate that such flatter regions generalize better than sharper minima, e.g., due to the flat minimizer's robustness against loss function shifts between trainandtestdata,asillustrated inFig.1.

machine learning, natural language, urlhttp, (18 more...)

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > Canada > Quebec > Montreal (0.05)
Europe > France (0.05)
(16 more...)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-18-2025

A GPU-Accelerated RAG-Based Telegram Assistant for Supporting Parallel Processing Students

Tel-Zur, Guy

This project addresses a critical pedagogical need: offering students continuous, on-demand academic assistance beyond conventional reception hours. I present a domain-specific Retrieval-Augmented Generation (RAG) system powered by a quantized Mistral-7B Instruct model and deployed as a Telegram bot. The assistant enhances learning by delivering real-time, personalized responses aligned with the "Introduction to Parallel Processing" course materials. GPU acceleration significantly improves inference latency, enabling practical deployment on consumer hardware. This approach demonstrates how consumer GPUs can enable affordable, private, and effective AI tutoring for HPC education.

large language model, machine learning, natural language, (16 more...)

2509.11947

Genre: Instructional Material > Course Syllabus & Notes (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Neural Information Processing SystemsSep-28-2025, 23:29:23 GMT

a6805b5564bd8d813a81c4b5a97e5ca6-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (17 more...)

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (0.93)

Industry:

Government (0.46)
Information Technology (0.46)
Energy > Oil & Gas (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.93)
(4 more...)

Neural Information Processing SystemsAug-15-2025, 13:50:08 GMT

69b5534586d6c035a96b49c86dbeece8-Supplemental-Conference.pdf

rate 0, swa, wasam, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsAug-15-2025, 13:50:04 GMT

69b5534586d6c035a96b49c86dbeece8-Paper-Conference.pdf

international conference, optimizer, proceedings, (11 more...)

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
(23 more...)

Genre: Research Report (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Basu, Debabrota, Chanda, Debarshi

Sublinear Algorithms for Wasserstein and Total Variation Distances: Applications to Fairness and Privacy Auditing

arXiv.org Artificial IntelligenceMar-10-2025

Resource-efficiently computing representations of probability distributions and the distances between them while only having access to the samples is a fundamental and useful problem across mathematical sciences. In this paper, we propose a generic algorithmic framework to estimate the PDF and CDF of any sub-Gaussian distribution while the samples from them arrive in a stream. We compute mergeable summaries of distributions from the stream of samples that require sublinear space w.r.t. the number of observed samples. This allows us to estimate Wasserstein and Total Variation (TV) distances between any two sub-Gaussian distributions while samples arrive in streams and from multiple sources (e.g. federated learning). Our algorithms significantly improves on the existing methods for distance estimation incurring super-linear time and linear space complexities. In addition, we use the proposed estimators of Wasserstein and TV distances to audit the fairness and privacy of the ML algorithms. We empirically demonstrate the efficiency of the algorithms for estimating these distances and auditing using both synthetic and real-world datasets.

res, tv distance, wasserstein distance, (15 more...)

2503.07775

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Hägele, Alexander, Bakouch, Elie, Kosson, Atli, Allal, Loubna Ben, Von Werra, Leandro, Jaggi, Martin

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

arXiv.org Artificial IntelligenceMay-29-2024

Scale has become a main ingredient in obtaining strong machine learning models. As a result, understanding a model's scaling properties is key to effectively designing both the right training setup as well as future generations of architectures. In this work, we argue that scale and training research has been needlessly complex due to reliance on the cosine schedule, which prevents training across different lengths for the same model size. We investigate the training behavior of a direct alternative -- constant learning rate and cooldowns -- and find that it scales predictably and reliably similar to cosine. Additionally, we show that stochastic weight averaging yields improved performance along the training trajectory, without additional training costs, across different scales. Importantly, with these findings we demonstrate that scaling experiments can be performed with significantly reduced compute and GPU hours by utilizing fewer but reusable training runs.

cooldown, experiment, language model, (12 more...)

2405.18392

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Lobacheva, Ekaterina, Pockonechnyy, Eduard, Kodryan, Maxim, Vetrov, Dmitry

Large Learning Rates Improve Generalization: But How Large Are We Talking About?

arXiv.org Machine LearningNov-19-2023

Inspired by recent research that recommends starting neural networks training with large learning rates (LRs) to achieve the best generalization, we explore this hypothesis in detail. Our study clarifies the initial LR ranges that provide optimal results for subsequent training with a small LR or weight averaging. We find that these ranges are in fact significantly narrower than generally assumed. We conduct our main experiments in a simplified setup that allows precise control of the learning rate hyperparameter and validate our key findings in a more practical setting.

artificial intelligence, machine learning, regime, (17 more...)

arXiv.org Machine Learning

2311.11303

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Agarwal, Milind, Alam, Md Mahfuz Ibn, Anastasopoulos, Antonios

LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ Languages

arXiv.org Artificial IntelligenceNov-6-2023

Knowing the language of an input text/audio is a necessary first step for using almost every NLP tool such as taggers, parsers, or translation systems. Language identification is a well-studied problem, sometimes even considered solved; in reality, due to lack of data and computational challenges, current systems cannot accurately identify most of the world's 7000 languages. To tackle this bottleneck, we first compile a corpus, MCS-350, of 50K multilingual and parallel children's stories in 350+ languages. MCS-350 can serve as a benchmark for language identification of short texts and for 1400+ new translation directions in low-resource Indian and African languages. Second, we propose a novel misprediction-resolution hierarchical model, LIMIt, for language identification that reduces error by 55% (from 0.71 to 0.32) on our compiled children's stories dataset and by 40% (from 0.23 to 0.14) on the FLORES-200 benchmark. Our method can expand language identification coverage into low-resource languages by relying solely on systemic misprediction patterns, bypassing the need to retrain large models from scratch.

computational linguistic, language identification, proceedings, (14 more...)

2305.14263

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ukraine (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(28 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.92)