AITopics | preactivation

3edb234091dca2023308398dbf824850-Paper-Conference.pdf

Neural Information Processing SystemsJun-16-2026, 17:23:00 GMT

We propose a testable universality hypothesis, asserting that seemingly disparate neural network solutions observed in the simple task of modular addition are unified under a common abstract algorithm. While prior work interpreted variations in neuron-level representations as evidence for distinct algorithms, we demonstrate, through multi-level analyses spanning neurons, neuron clusters, and entire networks, that multilayer perceptrons and transformers universally implement the abstract algorithm we call the approximate Chinese Remainder Theorem. Crucially, we introduce approximate cosets and show that neurons activate exclusively on them. Furthermore, our theory works for deep neural networks (DNNs). It predicts that universally learned solutions in DNNs with trainable embeddings or more than one hidden layer require only O(log(n))features, a result we empirically confirm. This work thus provides the first theory-backed interpretation of multilayer networks solving modular addition. It advances generalizable interpretability and opens a testable universality hypothesis for group multiplication beyond modular addition.

artificial intelligence, machine learning, neuron, (19 more...)

Neural Information Processing Systems

Country: North America > Canada (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

1baff70e2669e8376347efd3a874a341-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 23:10:37 GMT

artificial intelligence, linear network, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

7e02f2910ea7911a37c4691f4201c878-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 11:53:36 GMT

artificial intelligence, layernorm, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

1baff70e2669e8376347efd3a874a341-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 17:16:19 GMT

linear network, neural network, relu network, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Critical Initialization of Wide and Deep Neural Networks using Partial Jacobians: General Theory and Applications

Neural Information Processing SystemsDec-26-2025, 05:28:17 GMT

Deep neural networks are notorious for defying theoretical treatment. However, when the number of parameters in each layer tends to infinity, the network function is a Gaussian process (GP) and quantitatively predictive description is possible. Gaussian approximation allows one to formulate criteria for selecting hyperparameters, such as variances of weights and biases, as well as the learning rate. These criteria rely on the notion of criticality defined for deep neural networks. In this work we describe a new practical way to diagnose criticality.

critical initialization, partial jacobian, wide and deep neural network, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A mathematical theory for understanding when abstract representations emerge in neural networks

Wang, Bin, Johnston, W. Jeffrey, Fusi, Stefano

arXiv.org Machine LearningOct-15-2025

Recent experiments reveal that task-relevant variables are often encoded in approximately orthogonal subspaces of the neural activity space. These disentangled low-dimensional representations are observed in multiple brain areas and across different species, and are typically the result of a process of abstraction that supports simple forms of out-of-distribution generalization. The mechanisms by which such geometries emerge remain poorly understood, and the mechanisms that have been investigated are typically unsupervised (e.g., based on variational auto-encoders). Here, we show mathematically that abstract representations of latent variables are guaranteed to appear in the last hidden layer of feedforward nonlinear networks when they are trained on tasks that depend directly on these latent variables. These abstract representations reflect the structure of the desired outputs or the semantics of the input stimuli. To investigate the neural representations that emerge in these networks, we develop an analytical framework that maps the optimization over the network weights into a mean-field problem over the distribution of neural preactivations. Applying this framework to a finite-width ReLU network, we find that its hidden layer exhibits an abstract representation at all global minima of the task objective. We further extend these analyses to two broad families of activation functions and deep feedforward architectures, demonstrating that abstract representations naturally arise in all these scenarios. Together, these results provide an explanation for the widely observed abstract representations in both the brain and artificial neural networks, as well as a mathematically tractable toolkit for understanding the emergence of different kinds of representations in task-optimized, feature-learning network models.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Machine Learning

2510.09816

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture

Dunbar, John, Aaronson, Scott

arXiv.org Machine LearningOct-9-2025

We establish that randomly initialized neural networks, with large width and a natural choice of hyperparameters, have nearly independent outputs exactly when their activation function is nonlinear with zero mean under the Gaussian measure: $\mathbb{E}_{z \sim \mathcal{N}(0,1)}[σ(z)]=0$. For example, this includes ReLU and GeLU with an additive shift, as well as tanh, but not ReLU or GeLU by themselves. Because of their nearly independent outputs, we propose neural networks with zero-mean activation functions as a promising candidate for the Alignment Research Center's computational no-coincidence conjecture -- a conjecture that aims to measure the limits of AI interpretability.

activation function, neural network, retrieved, (14 more...)

arXiv.org Machine Learning

2510.06527

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Critical Initialization of Wide and Deep Neural Networks using Partial Jacobians: General Theory and Applications

Neural Information Processing SystemsOct-8-2025, 23:32:47 GMT

Deep neural networks are notorious for defying theoretical treatment.

artificial intelligence, layernorm, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BaB-prob: Branch and Bound with Preactivation Splitting for Probabilistic Verification of Neural Networks

Wang, Fangji, Tsiotras, Panagiotis

arXiv.org Machine LearningOct-1-2025

Branch-and-bound with preactivation splitting has been shown highly effective for deterministic verification of neural networks. In this paper, we extend this framework to the probabilistic setting. We propose BaB-prob that iteratively divides the original problem into subproblems by splitting preactivations and leverages linear bounds computed by linear bound propagation to bound the probability for each subproblem. We prove soundness and completeness of BaB-prob for feedforward-ReLU neural networks. Furthermore, we introduce the notion of uncertainty level and design two efficient strategies for preactivation splitting, yielding BaB-prob-ordered and BaB+BaBSR-prob. We evaluate BaB-prob on untrained networks, MNIST and CIFAR-10 models, respectively, and VNN-COMP 2025 benchmarks. Across these settings, our approach consistently outperforms state-of-the-art approaches in medium- to high-dimensional input problems.

equation, preactivation, probability, (17 more...)

arXiv.org Machine Learning

2509.25647

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks

McCracken, Gavin, Moisescu-Pareja, Gabriela, Letourneau, Vincent, Precup, Doina, Love, Jonathan

arXiv.org Artificial IntelligenceMay-27-2025

We propose a testable universality hypothesis, asserting that seemingly disparate neural network solutions observed in the simple task of modular addition are unified under a common abstract algorithm. While prior work interpreted variations in neuron-level representations as evidence for distinct algorithms, we demonstrate - through multi-level analyses spanning neurons, neuron clusters, and entire networks - that multilayer perceptrons and transformers universally implement the abstract algorithm we call the approximate Chinese Remainder Theorem. Crucially, we introduce approximate cosets and show that neurons activate exclusively on them. Furthermore, our theory works for deep neural networks (DNNs). It predicts that universally learned solutions in DNNs with trainable embeddings or more than one hidden layer require only O(log n) features, a result we empirically confirm. This work thus provides the first theory-backed interpretation of multilayer networks solving modular addition. It advances generalizable interpretability and opens a testable universality hypothesis for group multiplication beyond modular addition.

artificial intelligence, machine learning, neuron, (18 more...)

arXiv.org Artificial Intelligence

2505.18266

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Filters

Collaborating Authors

preactivation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

3edb234091dca2023308398dbf824850-Paper-Conference.pdf

1baff70e2669e8376347efd3a874a341-Paper.pdf

7e02f2910ea7911a37c4691f4201c878-Paper-Conference.pdf

1baff70e2669e8376347efd3a874a341-Paper.pdf

Critical Initialization of Wide and Deep Neural Networks using Partial Jacobians: General Theory and Applications

A mathematical theory for understanding when abstract representations emerge in neural networks

Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture

Critical Initialization of Wide and Deep Neural Networks using Partial Jacobians: General Theory and Applications

BaB-prob: Branch and Bound with Preactivation Splitting for Probabilistic Verification of Neural Networks

Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks