statistical physics approach
Generalization in multitask deep neural classifiers: a statistical physics approach
We would first like to thank all three reviewers for their thorough, constructive and considered reviews. Appendix A, our model is a nonequilibrium variant of Derrida's Random Energy Model. We will update the final manuscript to describe this analogy more explicitly. As such, this is still a matter of active research. Conditions claimed in L181-184: We will amend the manuscript to indicate that the equation directly preceding eqn.
Generalization in multitask deep neural classifiers: a statistical physics approach
We would first like to thank all three reviewers for their thorough, constructive and considered reviews. Appendix A, our model is a nonequilibrium variant of Derrida's Random Energy Model. We will update the final manuscript to describe this analogy more explicitly. As such, this is still a matter of active research. Conditions claimed in L181-184: We will amend the manuscript to indicate that the equation directly preceding eqn.
Reviews: Generalization in multitask deep neural classifiers: a statistical physics approach
The experiments on multitask learning are informative. I wish the experiments and theory were a bit more integrated. See my comments below for more details. The authors moved a lot of details to the appendix while keeping the main conclusions in the main submission to ease understanding. Here are some examples: (a) L181-184 what equation shows (s_A - \tilde{s_A}) depends on the said 4 things; (b) L185-186 when labelled data is scarce why is (\bar{s_A*g(s_A)}-\tilde{s_A*g(s_A)} 0; (c) L189-190 why does (\bar{s_A*g(s_A)}-\tilde{s_A*g(s_A)} tend to 0 when training data is abundant.
Reviews: Generalization in multitask deep neural classifiers: a statistical physics approach
This paper is a nice combination of theoretical understanding and simple experiments to verify it in the case of multitask learning in neural nets. Given that there is not much known in this space, this work can be impactful. I suggest authors to add a few multi-task experiments with real datasets to verify their understanding.
Generalization in multitask deep neural classifiers: a statistical physics approach
A proper understanding of the striking generalization abilities of deep neural networks presents an enduring puzzle. Recently, there has been a growing body of numerically-grounded theoretical work that has contributed important insights to the theory of learning in deep neural nets. There has also been a recent interest in extending these analyses to understanding how multitask learning can further improve the generalization capacity of deep neural nets. These studies deal almost exclusively with regression tasks which are amenable to existing analytical techniques. We develop an analytic theory of the nonlinear dynamics of generalization of deep neural networks trained to solve classification tasks using softmax outputs and cross-entropy loss, addressing both single task and multitask settings.
Generalization in multitask deep neural classifiers: a statistical physics approach
A proper understanding of the striking generalization abilities of deep neural networks presents an enduring puzzle. Recently, there has been a growing body of numerically-grounded theoretical work that has contributed important insights to the theory of learning in deep neural nets. There has also been a recent interest in extending these analyses to understanding how multitask learning can further improve the generalization capacity of deep neural nets. These studies deal almost exclusively with regression tasks which are amenable to existing analytical techniques. We develop an analytic theory of the nonlinear dynamics of generalization of deep neural networks trained to solve classification tasks using softmax outputs and cross-entropy loss, addressing both single task and multitask settings.
A statistical physics approach to learning curves for the Inverse Ising problem
Bachschmid-Romano, Ludovica, Opper, Manfred
Using methods of statistical physics, we analyse the error of learning couplings in large Ising models from independent data (the inverse Ising problem). We concentrate on learning based on local cost functions, such as the pseudo-likelihood method for which the couplings are inferred independently for each spin. Assuming that the data are generated from a true Ising model, we compute the reconstruction error of the couplings using a combination of the replica method with the cavity approach for densely connected systems. We show that an explicit estimator based on a quadratic cost function achieves minimal reconstruction error, but requires the length of the true coupling vector as prior knowledge. A simple mean field estimator of the couplings which does not need such knowledge is asymptotically optimal, i.e. when the number of observations is much large than the number of spins. Comparison of the theory with numerical simulations shows excellent agreement for data generated from two models with random couplings in the high temperature region: a model with independent couplings (Sherrington-Kirkpatrick model), and a model where the matrix of couplings has a Wishart distribution.
Distributed Aggregation in the Presence of Uncertainty: A Statistical Physics Approach
Hsieh, Mong-ying Ani (Drexel University) | Mather, Thomas William (Drexel University)
We present a statistical physics inspired approach to modeling, analysis, and design of distributed aggregation control policies for teams of homogeneous and heterogeneous robots. We assume high-level agent behavior can be described as a sequential composition of lower-level behavioral primitives. Aggregation or division of the collective into distinct clusters is achieved by developing a macroscopic description of the ensemble dynamics. The advantages of this approach are twofold: 1) the derivation of a low dimensional but highly predictive description of the collective dynamics and 2) a framework where interaction uncertainties between the low-level components can be explicitly modeled and control. Additionally, classical dynamical systems theory and control theoretic techniques can be used to analyze and shape the collective dynamics of the system. We consider the aggregation problem for homogeneous agents into clusters located at distinct regions in the workspace and discuss the extension to heterogeneous teams of autonomous agents. We show how a macroscopic model of the aggregation dynamics can be derived from agent-level behaviors and discuss the synthesis of distributed coordination strategies in the presence of uncertainty.
- North America > United States > Wisconsin (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
Self-Organizing Rules for Robust Principal Component Analysis
Principal Component Analysis (PCA) is an essential technique for data compression and feature extraction, and has been widely used in statistical data analysis, communication theory, pattern recognition and image processing. In the neural network literature, a lot of studies have been made on learning rules for implementing PCA or on networks closely related to PCA (see Xu & Yuille, 1993 for a detailed reference list which contains more than 30 papers related to these issues).
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- Asia > Singapore (0.04)
- Asia > China > Beijing > Beijing (0.04)
Self-Organizing Rules for Robust Principal Component Analysis
Principal Component Analysis (PCA) is an essential technique for data compression and feature extraction, and has been widely used in statistical data analysis, communication theory, pattern recognition and image processing. In the neural network literature, a lot of studies have been made on learning rules for implementing PCA or on networks closely related to PCA (see Xu & Yuille, 1993 for a detailed reference list which contains more than 30 papers related to these issues).
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- Asia > Singapore (0.04)
- Asia > China > Beijing > Beijing (0.04)