AITopics

2604.00316

Country:

North America > United States (0.28)
Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)

Genre: Research Report (0.82)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsMar-22-2026, 19:42:49 GMT

Average gradient outer product as a mechanism for deep neural collapse

Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a variety of settings, its emergence is typically explained via data-agnostic approaches, such as the unconstrained features model. In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP). The AGOP is defined with respect to a learned predictor and is equal to the uncentered covariance matrix of its input-output gradients averaged over the training dataset. Deep Recursive Feature Machines are a method that constructs a neural network by iteratively mapping the data with the AGOP and applying an untrained random feature map. We demonstrate theoretically and empirically that DNC occurs in Deep Recursive Feature Machines as a consequence of the projection with the AGOP matrix computed at each layer. We then provide evidence that this mechanism holds for neural networks more generally. We show that the right singular vectors and values of the weights can be responsible for the majority of within-class variability collapse for DNNs trained in the feature learning regime. As observed in recent work, this singular structure is highly correlated with that of the AGOP.

artificial intelligence, machine learning, proceedings, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsFeb-18-2026, 14:20:58 GMT

Average gradient outer product as a mechanism for deep neural collapse

This assumes that DNNs are infinitely expressive and, thus, optimizes for the feature vectors in the last layer directly.

artificial intelligence, deep learning, machine learning, (18 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Austria (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Neural Information Processing SystemsOct-10-2025, 20:29:54 GMT

Average gradient outer product as a mechanism for deep neural collapse Daniel Beaglehole,1 Peter Súkeník,2 Marco Mondelli 2 Mikhail Belkin 1 1

This assumes that DNNs are infinitely expressive and, thus, optimizes for the feature vectors in the last layer directly.

deep rfm, matrix, neural network, (15 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Austria (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Beaglehole, Daniel, Holzmüller, David, Radhakrishnan, Adityanarayanan, Belkin, Mikhail

xRFM: Accurate, scalable, and interpretable feature learning models for tabular data

arXiv.org Machine LearningAug-15-2025

Tabular data - collections of continuous and categorical variables organized into matrices - underlies all aspects of modern commerce and science from airplane engines to biology labs to bagel shops. Yet, while Machine Learning and AI for language and vision have seen unprecedented progress, the primary methodologies of prediction from tabular data have been relatively static, dominated by variations of Gradient Boosted Decision Trees (GBDTs), such as XGBoost [7]. Nevertheless, hundreds of tabular datasets have been assembled to form extensive regression and classification benchmarks [11, 12, 16, 35, 37], and, recently, there has been renewed interest in building state-of-the-art predictive models for tabular data [15, 18, 19]. Notably, given the remarkable effectiveness of large, "foundation" models for text, there has been much excitement in developing similar models on tabular data, and recent effort has led to the development of TabPFN-v2, a foundation model for tabular data appearing in Nature [18]. Yet, despite this progress, tabular data still remains an active area for model development and building scalable, effective, and interpretable machine learning models in this domain remains an open challenge. In this work, we introduce xRFM, a tabular predictive model that combines recent advances in feature learning kernel machines with an adaptive tree structure, making it effective, scalable, and interpretable.

artificial intelligence, dataset, machine learning, (15 more...)

2508.10053

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.68)
Transportation (0.66)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.87)

PCWorldAug-7-2025, 17:00:00 GMT

SimpliSafe Outdoor Security Camera 2 review: Barely an upgrade

SimpliSafe's new outdoor camera enables its new active response system, but it provides literally no other reason to upgrade from the previous camera. SimpliSafe is one of the most venerable smart home security companies, and while it regularly refreshes its hardware, it does so device by device, rather than upgrading the entire system at once. Makes sense, because it has at least 16 different components you can mix and match with your existing SimpliSafe base station or add on to one of its hardware bundles. The latest upgrade to the SimpliSafe family is a new version of the SimpliSafe Wireless Outdoor Security Camera, which was released in 2021. The SimpliSafe Outdoor Security Camera 2 keeps the overall look and feel of the original, while making a few changes that offer some compelling upgrades.

camera 2, simplisafe, upgrade, (11 more...)

PCWorld

Industry:

Commercial Services & Supplies > Security & Alarm Services (0.94)
Information Technology (0.90)

Technology: Information Technology > Artificial Intelligence (0.50)

Boix-Adsera, Enric, Mallinar, Neil, Simon, James B., Belkin, Mikhail

FACT: the Features At Convergence Theorem for neural networks

arXiv.org Machine LearningJul-9-2025

A central challenge in deep learning theory is to understand how neural networks learn and represent features. To this end, we prove the Features at Convergence Theorem (FACT), which gives a self-consistency equation that neural network weights satisfy at convergence when trained with nonzero weight decay. For each weight matrix $W$, this equation relates the "feature matrix" $W^\top W$ to the set of input vectors passed into the matrix during forward propagation and the loss gradients passed through it during backpropagation. We validate this relation empirically, showing that neural features indeed satisfy the FACT at convergence. Furthermore, by modifying the "Recursive Feature Machines" of Radhakrishnan et al. 2024 so that they obey the FACT, we arrive at a new learning algorithm, FACT-RFM. FACT-RFM achieves high performance on tabular data and captures various feature learning behaviors that occur in neural network training, including grokking in modular arithmetic and phase transitions in learning sparse parities.

artificial intelligence, machine learning, neural network, (19 more...)

2507.05644

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Neural Information Processing SystemsMay-27-2025, 20:34:13 GMT

Average gradient outer product as a mechanism for deep neural collapse

average gradient outer product, deep neural collapse, gradient outer product, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Beaglehole, Daniel, Radhakrishnan, Adityanarayanan, Boix-Adserà, Enric, Belkin, Mikhail

Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers

arXiv.org Machine LearningFeb-5-2025

A trained Large Language Model (LLM) contains much of human knowledge. Yet, it is difficult to gauge the extent or accuracy of that knowledge, as LLMs do not always ``know what they know'' and may even be actively misleading. In this work, we give a general method for detecting semantic concepts in the internal activations of LLMs. Furthermore, we show that our methodology can be easily adapted to steer LLMs toward desirable outputs. Our innovations are the following: (1) we use a nonlinear feature learning method to identify important linear directions for predicting concepts from each layer; (2) we aggregate features across layers to build powerful concept detectors and steering mechanisms. We showcase the power of our approach by attaining state-of-the-art results for detecting hallucinations, harmfulness, toxicity, and untruthful content on seven benchmarks. We highlight the generality of our approach by steering LLMs towards new concepts that, to the best of our knowledge, have not been previously considered in the literature, including: semantic disambiguation, human languages, programming languages, hallucinated responses, science subjects, poetic/Shakespearean English, and even multiple concepts simultaneously. Moreover, our method can steer concepts with numerical attributes such as product reviews. We provide our code (including a simple API for our methods) at https://github.com/dmbeaglehole/neural_controllers .

large language model, machine learning, natural language, (18 more...)

2502.03708

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > Mexico (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
(18 more...)

Genre:

Personal (1.00)
Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Mallinar, Neil, Beaglehole, Daniel, Zhu, Libin, Radhakrishnan, Adityanarayanan, Pandit, Parthe, Belkin, Mikhail

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

arXiv.org Machine LearningJul-29-2024

Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon where the test accuracy starts improving long after the model achieves 100% training accuracy in the training process. It is often taken as an example of "emergence", where model ability manifests sharply through a phase transition. In this work, we show that the phenomenon of grokking is not specific to neural networks nor to gradient descent-based optimization. Specifically, we show that this phenomenon occurs when learning modular arithmetic with Recursive Feature Machines (RFM), an iterative algorithm that uses the Average Gradient Outer Product (AGOP) to enable task-specific feature learning with general machine learning models. When used in conjunction with kernel machines, iterating RFM results in a fast transition from random, near zero, test accuracy to perfect test accuracy. This transition cannot be predicted from the training loss, which is identically zero, nor from the test loss, which remains constant in initial iterations. Instead, as we show, the transition is completely determined by feature learning: RFM gradually learns block-circulant features to solve modular arithmetic. Paralleling the results for RFM, we show that neural networks that solve modular arithmetic also learn block-circulant features. Furthermore, we present theoretical evidence that RFM uses such block-circulant features to implement the Fourier Multiplication Algorithm, which prior work posited as the generalizing solution neural networks learn on these tasks. Our results demonstrate that emergence can result purely from learning task-relevant features and is not specific to neural architectures nor gradient descent-based optimization methods. Furthermore, our work provides more evidence for AGOP as a key mechanism for feature learning in neural networks.

matrix, modular arithmetic, neural network, (15 more...)

2407.20199

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)