AITopics | statistical physics approach

Generalization in multitask deep neural classifiers: a statistical physics approach

Neural Information Processing SystemsDec-25-2025, 03:11:14 GMT

A proper understanding of the striking generalization abilities of deep neural networks presents an enduring puzzle. Recently, there has been a growing body of numerically-grounded theoretical work that has contributed important insights to the theory of learning in deep neural nets. There has also been a recent interest in extending these analyses to understanding how multitask learning can further improve the generalization capacity of deep neural nets. These studies deal almost exclusively with regression tasks which are amenable to existing analytical techniques. We develop an analytic theory of the nonlinear dynamics of generalization of deep neural networks trained to solve classification tasks using softmax outputs and cross-entropy loss, addressing both single task and multitask settings. We do so by adapting techniques from the statistical physics of disordered systems, accounting for both finite size datasets and correlated outputs induced by the training dynamics. We discuss the validity of our theoretical results in comparison to a comprehensive suite of numerical experiments. Our analysis provides theoretical support for the intuition that the performance of multitask learning is determined by the noisiness of the tasks and how well their input features align with each other. Highly related, clean tasks benefit each other, whereas unrelated, clean tasks can be detrimental to individual task performance.

generalization, multitask deep neural classifier, statistical physics approach, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generalization in multitask deep neural classifiers: a statistical physics approach

Neural Information Processing SystemsOct-2-2025, 08:56:21 GMT

We would first like to thank all three reviewers for their thorough, constructive and considered reviews. Appendix A, our model is a nonequilibrium variant of Derrida's Random Energy Model. We will update the final manuscript to describe this analogy more explicitly. As such, this is still a matter of active research. Conditions claimed in L181-184: We will amend the manuscript to indicate that the equation directly preceding eqn.

artificial intelligence, machine learning, multitask deep neural classifier, (12 more...)

Neural Information Processing Systems

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

Reviews: Generalization in multitask deep neural classifiers: a statistical physics approach

Neural Information Processing SystemsFeb-11-2025, 21:18:50 GMT

The experiments on multitask learning are informative. I wish the experiments and theory were a bit more integrated. See my comments below for more details. The authors moved a lot of details to the appendix while keeping the main conclusions in the main submission to ease understanding. Here are some examples: (a) L181-184 what equation shows (s_A - \tilde{s_A}) depends on the said 4 things; (b) L185-186 when labelled data is scarce why is (\bar{s_A*g(s_A)}-\tilde{s_A*g(s_A)} 0; (c) L189-190 why does (\bar{s_A*g(s_A)}-\tilde{s_A*g(s_A)} tend to 0 when training data is abundant.

multitask deep neural classifier, relevant equation, statistical physics approach, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.64)

Add feedback

Reviews: Generalization in multitask deep neural classifiers: a statistical physics approach

Neural Information Processing SystemsFeb-11-2025, 21:18:40 GMT

This paper is a nice combination of theoretical understanding and simple experiments to verify it in the case of multitask learning in neural nets. Given that there is not much known in this space, this work can be impactful. I suggest authors to add a few multi-task experiments with real datasets to verify their understanding.

generalization, multitask deep neural classifier, statistical physics approach, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.89)

Add feedback

Generalization in multitask deep neural classifiers: a statistical physics approach

Neural Information Processing SystemsJan-22-2025, 06:23:49 GMT

A proper understanding of the striking generalization abilities of deep neural networks presents an enduring puzzle. Recently, there has been a growing body of numerically-grounded theoretical work that has contributed important insights to the theory of learning in deep neural nets. There has also been a recent interest in extending these analyses to understanding how multitask learning can further improve the generalization capacity of deep neural nets. These studies deal almost exclusively with regression tasks which are amenable to existing analytical techniques. We develop an analytic theory of the nonlinear dynamics of generalization of deep neural networks trained to solve classification tasks using softmax outputs and cross-entropy loss, addressing both single task and multitask settings.

generalization, multitask deep neural classifier, statistical physics approach, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generalization in multitask deep neural classifiers: a statistical physics approach

Ndirango, Anthony, Lee, Tyler

Neural Information Processing SystemsMar-19-2020, 03:16:46 GMT

A proper understanding of the striking generalization abilities of deep neural networks presents an enduring puzzle. Recently, there has been a growing body of numerically-grounded theoretical work that has contributed important insights to the theory of learning in deep neural nets. There has also been a recent interest in extending these analyses to understanding how multitask learning can further improve the generalization capacity of deep neural nets. These studies deal almost exclusively with regression tasks which are amenable to existing analytical techniques. We develop an analytic theory of the nonlinear dynamics of generalization of deep neural networks trained to solve classification tasks using softmax outputs and cross-entropy loss, addressing both single task and multitask settings.

generalization, multitask deep neural classifier, statistical physics approach, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A statistical physics approach to learning curves for the Inverse Ising problem

Bachschmid-Romano, Ludovica, Opper, Manfred

arXiv.org Machine LearningMay-15-2017

Using methods of statistical physics, we analyse the error of learning couplings in large Ising models from independent data (the inverse Ising problem). We concentrate on learning based on local cost functions, such as the pseudo-likelihood method for which the couplings are inferred independently for each spin. Assuming that the data are generated from a true Ising model, we compute the reconstruction error of the couplings using a combination of the replica method with the cavity approach for densely connected systems. We show that an explicit estimator based on a quadratic cost function achieves minimal reconstruction error, but requires the length of the true coupling vector as prior knowledge. A simple mean field estimator of the couplings which does not need such knowledge is asymptotically optimal, i.e. when the number of observations is much large than the number of spins. Comparison of the theory with numerical simulations shows excellent agreement for data generated from two models with random couplings in the high temperature region: a model with independent couplings (Sherrington-Kirkpatrick model), and a model where the matrix of couplings has a Wishart distribution.

artificial intelligence, machine learning, statistical physics approach, (18 more...)

arXiv.org Machine Learning

doi: 10.1088/1742-5468/aa727d

1705.05403

Country: Europe > Germany (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Distributed Aggregation in the Presence of Uncertainty: A Statistical Physics Approach

Hsieh, Mong-ying Ani (Drexel University) | Mather, Thomas William (Drexel University)

AAAI ConferencesMar-25-2012

We present a statistical physics inspired approach to modeling, analysis, and design of distributed aggregation control policies for teams of homogeneous and heterogeneous robots. We assume high-level agent behavior can be described as a sequential composition of lower-level behavioral primitives. Aggregation or division of the collective into distinct clusters is achieved by developing a macroscopic description of the ensemble dynamics. The advantages of this approach are twofold: 1) the derivation of a low dimensional but highly predictive description of the collective dynamics and 2) a framework where interaction uncertainties between the low-level components can be explicitly modeled and control. Additionally, classical dynamical systems theory and control theoretic techniques can be used to analyze and shape the collective dynamics of the system. We consider the aggregation problem for homogeneous agents into clusters located at distinct regions in the workspace and discuss the extension to heterogeneous teams of autonomous agents. We show how a macroscopic model of the aggregation dynamics can be derived from agent-level behaviors and discuss the synthesis of distributed coordination strategies in the presence of uncertainty.

agent, artificial intelligence, robot, (17 more...)

AAAI Conferences

2012 AAAI Spring Symposium Series

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Semi-Supervised Learning -- A Statistical Physics Approach

Getz, Gad, Shental, Noam, Domany, Eytan

arXiv.org Artificial IntelligenceDec-1-2009

We present a novel approach to semi-supervised learning which is based on statistical physics. Most of the former work in the field of semi-supervised learning classifies the points by minimizing a certain energy function, which corresponds to a minimal k-way cut solution. In contrast to these methods, we estimate the distribution of classifications, instead of the sole minimal k-way cut, which yields more accurate and robust results. Our approach may be applied to all energy functions used for semi-supervised learning. The method is based on sampling using a Mul-ticanonical Markov chain Monte-Carlo algorithm, and has a straightforward probabilistic interpretation, which allows for soft assignments of points to classes, and also to cope with yet unseen class types. The suggested approach is demonstrated on a toy data set and on two real-life data sets of gene expression.

classification, labelled point, machine learning, (16 more...)

arXiv.org Artificial Intelligence

cs/0604011

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.69)
Health & Medicine > Therapeutic Area > Hematology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Self-Organizing Rules for Robust Principal Component Analysis

Xu, Lei, Yuille, Alan L.

Neural Information Processing SystemsDec-31-1993

Principal Component Analysis (PCA) is an essential technique for data compression and feature extraction, and has been widely used in statistical data analysis, communication theory, pattern recognition and image processing. In the neural network literature, a lot of studies have been made on learning rules for implementing PCA or on networks closely related to PCA (see Xu & Yuille, 1993 for a detailed reference list which contains more than 30 papers related to these issues).

outlier, principal component vector, rule eq, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.62)

Add feedback

Filters

Collaborating Authors

statistical physics approach

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Generalization in multitask deep neural classifiers: a statistical physics approach

Generalization in multitask deep neural classifiers: a statistical physics approach

Reviews: Generalization in multitask deep neural classifiers: a statistical physics approach

Reviews: Generalization in multitask deep neural classifiers: a statistical physics approach

Generalization in multitask deep neural classifiers: a statistical physics approach

Generalization in multitask deep neural classifiers: a statistical physics approach

A statistical physics approach to learning curves for the Inverse Ising problem

Distributed Aggregation in the Presence of Uncertainty: A Statistical Physics Approach

Semi-Supervised Learning -- A Statistical Physics Approach

Self-Organizing Rules for Robust Principal Component Analysis