arXiv Paper Spotlight: Why Does Deep and Cheap Learning Work So Well?

#artificialintelligence

Why does deep learning work so well? A recent paper by Henry W. Lin (Harvard) and Max Tegmark (MIT), titled "Why does deep and cheap learning work so well?" looks to examine from a different perspective what it is about deep learning that makes it work so well. It also introduces (at least, to me) the term "cheap learning." First off, to be clear, "cheap learning" does not refer to using a low end GPU; instead, the following explains its relationship to parameter reduction: This central idea of this paper is that neural network success owes as much to physics as it does to mathematics (perhaps more), and that simplistic physics functions owing to concepts such as symmetry, locality, compositionality, and polynomial log-probability can be viewed similarly to deep learning's relationship with the reality which it seeks to model. You may have heard something about this in September; this is the paper on which said news was based.


The power of deeper networks for expressing natural functions

arXiv.org Machine Learning

It is well-known that neural networks are universal approximators, but that deeper networks tend to be much more efficient than shallow ones. We shed light on this by proving that the total number of neurons $m$ required to approximate natural classes of multivariate polynomials of $n$ variables grows only linearly with $n$ for deep neural networks, but grows exponentially when merely a single hidden layer is allowed. We also provide evidence that when the number of hidden layers is increased from $1$ to $k$, the neuron requirement grows exponentially not with $n$ but with $n^{1/k}$, suggesting that the minimum number of layers required for computational tractability grows only logarithmically with $n$.


Learning hard quantum distributions with variational autoencoders

arXiv.org Machine Learning

Studying general quantum many-body systems is one of the major challenges in modern physics because it requires an amount of computational resources that scales exponentially with the size of the system.Simulating the evolution of a state, or even storing its description, rapidly becomes intractable for exact classical algorithms. Recently, machine learning techniques, in the form of restricted Boltzmann machines, have been proposed as a way to efficiently represent certain quantum states with applications in state tomography and ground state estimation. Here, we introduce a new representation of states based on variational autoencoders. Variational autoencoders are a type of generative model in the form of a neural network. We probe the power of this representation by encoding probability distributions associated with states from different classes. Our simulations show that deep networks give a better representation for states that are hard to sample from, while providing no benefit for random states. This suggests that the probability distributions associated to hard quantum states might have a compositional structure that can be exploited by layered neural networks. Specifically, we consider the learnability of a class of quantum states introduced by Fefferman and Umans. Such states are provably hard to sample for classical computers, but not for quantum ones, under plausible computational complexity assumptions. The good level of compression achieved for hard states suggests these methods can be suitable for characterising states of the size expected in first generation quantum hardware.


Deep Learning Explainability: Hints from Physics

#artificialintelligence

Nowadays, artificial intelligence is present in almost every part of our lives. Smartphones, social media feeds, recommendation engines, online ad networks, and navigation tools are some examples of AI-based applications that already affect us every day. Deep learning in areas such as speech recognition, autonomous driving, machine translation, and visual object recognition has been systematically improving the state of the art for a while now. However, the reasons that make deep neural networks (DNN) so powerful are only heuristically understood, i.e. we know only from experience that we can achieve excellent results by using large datasets and following specific training protocols. Recently, one possible explanation was proposed, based on a remarkable analogy between a physics-based conceptual framework called renormalization group (RG) and a type of neural network known as a restricted Boltzmann machine (RBM).


Deep Learning the Ising Model Near Criticality

arXiv.org Machine Learning

It is well established that neural networks with deep architectures perform better than shallow networks for many tasks in machine learning. In statistical physics, while there has been recent interest in representing physical data with generative modelling, the focus has been on shallow neural networks. A natural question to ask is whether deep neural networks hold any advantage over shallow networks in representing such data. We investigate this question by using unsupervised, generative graphical models to learn the probability distribution of a two-dimensional Ising system. Deep Boltzmann machines, deep belief networks, and deep restricted Boltzmann networks are trained on thermal spin configurations from this system, and compared to the shallow architecture of the restricted Boltzmann machine. We benchmark the models, focussing on the accuracy of generating energetic observables near the phase transition, where these quantities are most difficult to approximate. Interestingly, after training the generative networks, we observe that the accuracy essentially depends only on the number of neurons in the first hidden layer of the network, and not on other model details such as network depth or model type. This is evidence that shallow networks are more efficient than deep networks at representing physical probability distributions associated with Ising systems near criticality.