AITopics | bma

Empirical studies of trained models often report a transient regime in which signal is detectable in a finite gradient descent time window before overfitting dominates. We provide an analytically tractable random-matrix model that reproduces this phenomenon for gradient flow in a linear teacher--student setting. In this framework, learning occurs when an isolated eigenvalue separates from a noisy bulk, before eventually disappearing in the overfitting regime. The key ingredient is anisotropy in the input covariance, which induces fast and slow directions in the learning dynamics. In a two-block covariance model, we derive the full time-dependent bulk spectrum of the symmetrized weight matrix through a $2\times 2$ Dyson equation, and we obtain an explicit outlier condition for a rank-one teacher via a rank-two determinant formula. This yields a transient Baik-Ben Arous-Péché (BBP) transition: depending on signal strength and covariance anisotropy, the teacher spike may never emerge, emerge and persist, or emerge only during an intermediate time interval before being reabsorbed into the bulk. We map the corresponding phase diagrams and validate the theory against finite-size simulations. Our results provide a minimal solvable mechanism for early stopping as a transient spectral effect driven by anisotropy and noise.

artificial intelligence, machine learning, regime, (18 more...)

arXiv.org Machine Learning

2604.1845

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.24)
Europe > France > Île-de-France > Paris > Paris (0.05)

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

f763f7c9a6599e14b07add5937d8189c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 00:41:35 GMT

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

f763f7c9a6599e14b07add5937d8189c-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 12:05:20 GMT

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context

Joo, Taejong, Klabjan, Diego

arXiv.org Artificial IntelligenceFeb-6-2025

Transformers have demonstrated remarkable in-context learning (ICL) capabilities, adapting to new tasks by simply conditioning on demonstrations without parameter updates. Compelling empirical and theoretical evidence suggests that ICL, as a general-purpose learner, could outperform task-specific models. However, it remains unclear to what extent the transformers optimally learn in-context compared to principled learning algorithms. To bridge this gap, we introduce a new framework for quantifying optimality of ICL as a learning algorithm in stylized settings. Our findings reveal a striking dichotomy: while ICL initially matches the efficiency of a Bayes optimal estimator, its efficiency significantly deteriorates in long context. Through an information-theoretic analysis, we show that the diminishing efficiency is inherent to ICL. These results clarify the trade-offs in adopting ICL as a universal problem solver, motivating a new generation of on-the-fly adaptive methods without the diminishing efficiency.

bma, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.0458

Country: North America > United States > Illinois > Cook County > Evanston (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction

Rushing, Bruce

arXiv.org Artificial IntelligenceJun-24-2024

Mixture of experts is a prediction aggregation method in machine learning that aggregates the predictions of specialized experts. This method often outperforms Bayesian methods despite the Bayesian having stronger inductive guarantees. We argue that this is due to the greater functional capacity of mixture of experts. We prove that in a limiting case of mixture of experts will have greater capacity than equivalent Bayesian methods, which we vouchsafe through experiments on non-limiting cases. Finally, we conclude that mixture of experts is a type of abductive reasoning in the Peircian sense of hypothesis construction.

hypothesis, moe, vc dimension, (15 more...)

arXiv.org Artificial Intelligence

2406.1715

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Local Bayesian Dirichlet mixing of imperfect models

Kejzlar, Vojtech, Neufcourt, Léo, Nazarewicz, Witold

arXiv.org Machine LearningNov-2-2023

To improve the predictability of complex computational models in the experimentally-unknown domains, we propose a Bayesian statistical machine learning framework utilizing the Dirichlet distribution that combines results of several imperfect models. This framework can be viewed as an extension of Bayesian stacking. To illustrate the method, we study the ability of Bayesian model averaging and mixing techniques to mine nuclear masses. We show that the global and local mixtures of models reach excellent performance on both prediction accuracy and uncertainty quantification and are preferable to classical Bayesian model averaging. Additionally, our statistical analysis indicates that improving model predictions through mixing rather than mixing of corrected models leads to more robust extrapolations.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2311.01596

Country:

North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)
North America > United States > New York > Saratoga County > Saratoga Springs (0.04)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Collapsed Inference for Bayesian Deep Learning

Zeng, Zhe, Broeck, Guy Van den

arXiv.org Artificial IntelligenceJun-16-2023

Bayesian neural networks (BNNs) provide a formalism to quantify and calibrate uncertainty in deep learning. Current inference approaches for BNNs often resort to few-sample estimation for scalability, which can harm predictive performance, while its alternatives tend to be computationally prohibitively expensive. We tackle this challenge by revealing a previously unseen connection between inference on BNNs and volume computation problems. With this observation, we introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples. It improves over a Monte-Carlo sample by limiting sampling to a subset of the network weights while pairing it with some closed-form conditional distribution over the rest. A collapsed sample represents uncountably many models drawn from the approximate posterior and thus yields higher sample efficiency. Further, we show that the marginalization of a collapsed sample can be solved analytically and efficiently despite the non-linearity of neural networks by leveraging existing volume computation solvers. Our proposed use of collapsed samples achieves a balance between scalability and accuracy. On various regression and classification tasks, our collapsed Bayesian deep learning approach demonstrates significant improvements over existing methods and sets a new state of the art in terms of uncertainty estimation as well as predictive performance.

approximation, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.09686

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

A locally time-invariant metric for climate model ensemble predictions of extreme risk

Virdee, Mala, Kaiser, Markus, Shuckburgh, Emily, Ek, Carl Henrik, Kazlauskaite, Ieva

arXiv.org Artificial IntelligenceApr-18-2023

Adaptation-relevant predictions of climate change are often derived by combining climate model simulations in a multi-model ensemble. Model evaluation methods used in performance-based ensemble weighting schemes have limitations in the context of high-impact extreme events. We introduce a locally time-invariant method for evaluating climate model simulations with a focus on assessing the simulation of extremes. We explore the behaviour of the proposed method in predicting extreme heat days in Nairobi and provide comparative results for eight additional cities.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2211.16367

Country:

Africa > Kenya > Nairobi City County > Nairobi (0.26)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
(7 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Guiding the Sequential Experiments in Autonomous Experimentation Platforms through EI-based Bayesian Optimization and Bayesian Model Averaging

Raihan, Ahmed Shoyeb, Ahmed, Imtiaz

arXiv.org Artificial IntelligenceFeb-26-2023

Autonomous Experimentation Platforms (AEPs) are advanced manufacturing platforms that, under intelligent control, can sequentially search the material design space (MDS) and identify parameters with the desired properties. At the heart of the intelligent control of these AEPs is the policy guiding the sequential experiments, which is to choose the location to carry out the next experiment. In such cases, a balance between exploitation and exploration must be achieved. A Bayesian Optimization (BO) framework with Expected Improvement based (EI-based) acquisition function can effectively search the MDS and guide where to conduct the next experiments so that the underlying relationship can be identified with a smaller number of experiments. The traditional BO framework tries to optimize a black box objective function in a sequential manner by relying on a single model. However, this single-model approach does not account for model uncertainty. Bayesian Model Averaging (BMA) addresses this issue by working with multiple models and thus considering the uncertainty in the models. In this work, we first apply the conventional BO algorithm with the most popular EI-based experiment policy in a real-life fatigue dataset for steel to predict the fatigue strength of steel. Afterward, we apply BMA to the same dataset by working with a set of predictive models and compare the performance of BMA with the traditional BO algorithm, which relies on a single model for approximation. We compare the results in terms of RMSE and find that BMA performs better than EI-based BO in the prediction task by considering the model uncertainty in its framework.

artificial intelligence, experiment, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.1336

Country: North America > United States > West Virginia > Monongalia County > Morgantown (0.04)

Genre: Research Report (1.00)

Industry: Materials > Construction Materials (0.58)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)

Add feedback

Self-supervised Graph Representation Learning for Black Market Account Detection

Xu, Zequan, Li, Lianyun, Li, Hui, Sun, Qihang, Hu, Shaofeng, Ji, Rongrong

arXiv.org Artificial IntelligenceDec-5-2022

Nowadays, Multi-purpose Messaging Mobile App (MMMA) has become increasingly prevalent. MMMAs attract fraudsters and some cybercriminals provide support for frauds via black market accounts (BMAs). Compared to fraudsters, BMAs are not directly involved in frauds and are more difficult to detect. This paper illustrates our BMA detection system SGRL (Self-supervised Graph Representation Learning) used in WeChat, a representative MMMA with over a billion users. We tailor Graph Neural Network and Graph Self-supervised Learning in SGRL for BMA detection. The workflow of SGRL contains a pretraining phase that utilizes structural information, node attribute information and available human knowledge, and a lightweight detection phase. In offline experiments, SGRL outperforms state-of-the-art methods by 16.06%-58.17% on offline evaluation measures. We deploy SGRL in the online environment to detect BMAs on the billion-scale WeChat graph, and it exceeds the alternative by 7.27% on the online evaluation measure. In conclusion, SGRL can alleviate label reliance, generalize well to unseen data, and effectively detect BMAs in WeChat.

data mining, machine learning, node, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539597.3570466

2212.02679

Country:

Asia > Singapore > Central Region > Singapore (0.05)
Asia > China > Fujian Province > Xiamen (0.05)
Asia > China > Guangdong Province > Guangzhou (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (0.92)
Law Enforcement & Public Safety > Fraud (0.71)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

bma

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Random Matrix Theory of Early-Stopped Gradient Flow: A Transient BBP Scenario

f763f7c9a6599e14b07add5937d8189c-Paper-Conference.pdf

f763f7c9a6599e14b07add5937d8189c-Paper-Conference.pdf

Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context

Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction

Local Bayesian Dirichlet mixing of imperfect models

Collapsed Inference for Bayesian Deep Learning

A locally time-invariant metric for climate model ensemble predictions of extreme risk

Guiding the Sequential Experiments in Autonomous Experimentation Platforms through EI-based Bayesian Optimization and Bayesian Model Averaging

Self-supervised Graph Representation Learning for Black Market Account Detection