spin glass
Exploring Loss Landscapes through the Lens of Spin Glass Theory
Liao, Hao, Zhang, Wei, Huang, Zhanyi, Long, Zexiao, Zhou, Mingyang, Wu, Xiaoqun, Mao, Rui, Yeung, Chi Ho
In the past decade, significant strides in deep learning have led to numerous groundbreaking applications. Despite these advancements, the understanding of the high generalizability of deep learning, especially in such an over-parametrized space, remains limited. For instance, in deep neural networks (DNNs), their internal representations, decision-making mechanism, absence of overfitting in an over-parametrized space, superior generalizability, etc., remain less understood. Successful applications are often considered as empirical rather than scientific achievement. This paper delves into the loss landscape of DNNs through the lens of spin glass in statistical physics, a system characterized by a complex energy landscape with numerous metastable states, as a novel perspective in understanding how DNNs work. We investigated the loss landscape of single hidden layer neural networks activated by Rectified Linear Unit (ReLU) function, and introduced several protocols to examine the analogy between DNNs and spin glass. Specifically, we used (1) random walk in the parameter space of DNNs to unravel the structures in their loss landscape; (2) a permutation-interpolation protocol to study the connection between copies of identical regions in the loss landscape due to the permutation symmetry in the hidden layers; (3) hierarchical clustering to reveal the hierarchy among trained solutions of DNNs, reminiscent of the so-called Replica Symmetry Breaking (RSB) phenomenon (i.e. the Parisi solution) in spin glass; (4) finally, we examine the relationship between the ruggedness of DNN's loss landscape and its generalizability, showing an improvement of flattened minima.
- Asia > China > Hong Kong (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (18 more...)
Machine learning force-field models for metallic spin glass
Shi, Menglin, Zhang, Sheng, Chern, Gia-Wei
Metallic spin glass systems, such as dilute magnetic alloys, are characterized by randomly distributed local moments coupled to each other through a long-range electron-mediated effective interaction. We present a scalable machine learning (ML) framework for dynamical simulations of metallic spin glasses. A Behler-Parrinello type neural-network model, based on the principle of locality, is developed to accurately and efficiently predict electron-induced local magnetic fields that drive the spin dynamics. A crucial component of the ML model is a proper symmetry-invariant representation of local magnetic environment which is direct input to the neural net. We develop such a magnetic descriptor by incorporating the spin degrees of freedom into the atom-centered symmetry function methods which are widely used in ML force-field models for quantum molecular dynamics. We apply our approach to study the relaxation dynamics of an amorphous generalization of the s-d model. Our work highlights the promising potential of ML models for large-scale dynamical modeling of itinerant magnets with quenched disorder.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
- Europe > United Kingdom > England (0.14)
- Asia (0.14)
Optimisation via encodings: a renormalisation group perspective
Klemm, Konstantin, Mehta, Anita, Stadler, Peter F.
Difficult, in particular NP-complete, optimization problems are traditionally solved approximately using search heuristics. These are usually slowed down by the rugged landscapes encountered, because local minima arrest the search process. Cover-encoding maps were devised to circumvent this problem by transforming the original landscape to one that is free of local minima and enriched in near-optimal solutions. By definition, these involve the mapping of the original (larger) search space into smaller subspaces, by processes that typically amount to a form of coarse-graining. In this paper, we explore the details of this coarse-graining using formal arguments, as well as concrete examples of cover-encoding maps, that are investigated analytically as well as computationally. Our results strongly suggest that the coarse-graining involved in cover-encoding maps bears a strong resemblance to that encountered in renormalisation group schemes. Given the apparently disparate nature of these two formalisms, these strong similarities are rather startling, and suggest deep mathematical underpinnings that await further exploration.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Alameda County > Berkeley (0.14)
- Europe > Germany > Saxony > Leipzig (0.05)
- (8 more...)
Bayesian Inference with Nonlinear Generative Models: Comments on Secure Learning
Bereyhi, Ali, Loureiro, Bruno, Krzakala, Florent, Müller, Ralf R., Schulz-Baldes, Hermann
Unlike the classical linear model, nonlinear generative models have been addressed sparsely in the literature. This work aims to bring attention to these models and their secrecy potential. To this end, we invoke the replica method to derive the asymptotic normalized cross entropy in an inverse probability problem whose generative model is described by a Gaussian random field with a generic covariance function. Our derivations further demonstrate the asymptotic statistical decoupling of Bayesian inference algorithms and specify the decoupled setting for a given nonlinear model. The replica solution depicts that strictly nonlinear models establish an all-or-nothing phase transition: There exists a critical load at which the optimal Bayesian inference changes from being perfect to an uncorrelated learning. This finding leads to design of a new secure coding scheme which achieves the secrecy capacity of the wiretap channel. The proposed coding has a significantly smaller codebook size compared to the random coding scheme of Wyner. This interesting result implies that strictly nonlinear generative models are perfectly secured without any secure coding. We justify this latter statement through the analysis of an illustrative model for perfectly secure and reliable inference.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
- North America > United States > Massachusetts (0.04)
- (5 more...)
Phase Diagram of Restricted Boltzmann Machines and Generalised Hopfield Networks with Arbitrary Priors
Barra, Adriano, Genovese, Giuseppe, Sollich, Peter, Tantari, Daniele
Restricted Boltzmann Machines are described by the Gibbs measure of a bipartite spin glass, which in turn corresponds to the one of a generalised Hopfield network. This equivalence allows us to characterise the state of these systems in terms of retrieval capabilities, both at low and high load. We study the paramagnetic-spin glass and the spin glass-retrieval phase transitions, as the pattern (i.e. weight) distribution and spin (i.e. unit) priors vary smoothly from Gaussian real variables to Boolean discrete variables. Our analysis shows that the presence of a retrieval phase is robust and not peculiar to the standard Hopfield model with Boolean patterns. The retrieval region is larger when the pattern entries and retrieval units get more peaked and, conversely, when the hidden units acquire a broader prior and therefore have a stronger response to high fields. Moreover, at low load retrieval always exists below some critical temperature, for every pattern distribution ranging from the Boolean to the Gaussian case.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > San Mateo County > Redwood City (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (4 more...)
Why Does Deep Learning Not Have a Local Minimum?
Editor's note: This post originally appeared as an answer to a Quora question, which also included the following: "As I understand, the chance of having a derivative zero in each of the thousands of direction is low. Is there some other reason besides this?" Yes, there is a'theoretical justification', and has taken a couple decades to flush it out. I will first point out, however, it has been observed in practice. This was pointed out by LeCun in his early work on LeNet, and is actually discussed in the'orange book', "Pattern Classification" by David G. Stork, Peter E. Hart, and Richard O. Duda. The problem has been addressed in condensed matter physics 20 years ago in the study of spin glasses.
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.06)
- North America > United States > California > San Francisco County > San Francisco (0.06)
Optimization Search Finds a Heart of Glass
Stanford University visiting researcher Alireza Marandi (right) and post-doctoral scholar Peter McMahon inspect a prototype of a new light-based computer. A 20th-century theoretical model of the way magnetism develops in cooling solids is driving the development of analog computers that could deliver results with much less electrical power than today's super-computers. But the work may instead yield improved digital algorithms rather than a mainstream analog architecture. Helmut Katzgraber, associate professor at Texas A&M in College Station, TX, argues, "There is a deep synergy between classical optimization, statistical physics, high-performance computing, and quantum computing. Those things really go hand in hand. Nature is the best optimizer out there. Lightning typically chooses the path of least resistance. A soap bubble will always give you the minimal surface."
- North America > United States > Texas > Brazos County > College Station (0.25)
- North America > United States > Virginia (0.05)
- North America > United States > California (0.05)
- (3 more...)