Tegmark, Max

A popular approach for predicting the future of dynamical systems involves mapping them into a lower-dimensional "latent space" where prediction is easier. We show that the information-theoretically optimal approach uses different mappings for present and future, in contrast to state-of-the-art machine-learning approaches where both mappings are the same. We illustrate this dichotomy by predicting the time-evolution of coupled harmonic oscillators with dissipation and thermal noise, showing how the optimal 2-mapping method significantly outperforms principal component analysis and all other approaches that use a single latent representation, and discuss the intuitive reason why two representations are better than one. We conjecture that a single latent representation is optimal only for time-reversible processes, not for e.g. text, speech, music or out-of-equilibrium physical systems.

Jing, Li (Massachusetts Institute of Technology) | Gulcehre, Caglar (MILA - Universite de Montreal) | Peurifoy, John (Massachusetts Institute of Technology) | Shen, Yichen (Massachusetts Institute of Technology) | Tegmark, Max (Massachusetts Institute of Technology) | Soljacic, Marin (Massachusetts Institute of Technology) | Bengio, Yoshua (MILA - Universite de Montreal)

We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory. We achieve this by extending unitary RNNs with a gating mechanism. Our model is able to outperform LSTMs, GRUs and Unitary RNNs on several long-term dependency benchmark tasks. We empirically both show the orthogonal/unitary RNNs lack the ability to forget and also the ability of GORU to simultaneously remember long term dependencies while forgetting irrelevant information. This plays an important role in recurrent neural networks. We provide competitive results along with an analysis of our model on many natural sequential tasks including the bAbI Question Answering, TIMIT speech spectrum prediction, Penn TreeBank, and synthetic tasks that involve long-term dependencies such as algorithmic, parenthesis, denoising and copying tasks.

Jing, Li, Gulcehre, Caglar, Peurifoy, John, Shen, Yichen, Tegmark, Max, Soljačić, Marin, Bengio, Yoshua

Lin, Henry W., Tegmark, Max, Rolnick, David

We show how the success of deep learning could depend not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can frequently be approximated through "cheap learning" with exponentially fewer parameters than generic ones. We explore how properties frequently encountered in physics such as symmetry, locality, compositionality, and polynomial log-probability translate into exceptionally simple neural networks. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to the renormalization group. We prove various "no-flattening theorems" showing when efficient linear deep networks cannot be accurately approximated by shallow ones without efficiency loss, for example, we show that $n$ variables cannot be multiplied using fewer than 2^n neurons in a single hidden layer.

Rolnick, David, Tegmark, Max

It is well-known that neural networks are universal approximators, but that deeper networks tend to be much more efficient than shallow ones. We shed light on this by proving that the total number of neurons $m$ required to approximate natural classes of multivariate polynomials of $n$ variables grows only linearly with $n$ for deep neural networks, but grows exponentially when merely a single hidden layer is allowed. We also provide evidence that when the number of hidden layers is increased from $1$ to $k$, the neuron requirement grows exponentially not with $n$ but with $n^{1/k}$, suggesting that the minimum number of layers required for computational tractability grows only logarithmically with $n$.

Jing, Li, Shen, Yichen, Dubček, Tena, Peurifoy, John, Skirlo, Scott, LeCun, Yann, Tegmark, Max, Soljačić, Marin

Using unitary (instead of general) matrices in artificial neural networks (ANNs) is a promising way to solve the gradient explosion/vanishing problem, as well as to enable ANNs to learn long-term correlations in the data. This approach appears particularly promising for Recurrent Neural Networks (RNNs). In this work, we present a new architecture for implementing an Efficient Unitary Neural Network (EUNNs); its main advantages can be summarized as follows. Firstly, the representation capacity of the unitary space in an EUNN is fully tunable, ranging from a subspace of SU(N) to the entire unitary space. Secondly, the computational complexity for training an EUNN is merely $\mathcal{O}(1)$ per parameter. Finally, we test the performance of EUNNs on the standard copying task, the pixel-permuted MNIST digit recognition benchmark as well as the Speech Prediction Test (TIMIT). We find that our architecture significantly outperforms both other state-of-the-art unitary RNNs and the LSTM architecture, in terms of the final performance and/or the wall-clock training speed. EUNNs are thus promising alternatives to RNNs and LSTMs for a wide variety of applications.

Russell, Stuart, Dewey, Daniel, Tegmark, Max

Success in the quest for artificial intelligence has the potential to bring unprecedented benefits to humanity, and it is therefore worthwhile to investigate how to maximize these benefits while avoiding potential pitfalls. This article gives numerous examples (which should by no means be construed as an exhaustive list) of such worthwhile research aimed at ensuring that AI remains robust and beneficial.

Russell, Stuart (University of California, Berkeley) | Dewey, Daniel (Oxford University) | Tegmark, Max (Massachusetts Institute of Technology)

Tegmark, Max (Massachusetts Institute of Technology)

Relentless progress in artificial intelligence (AI) is increasingly raising concerns that machines will replace humans on the job market, and perhaps altogether. Eliezer Yudkowski and others have explored the possibility that a promising future for humankind could be guaranteed by a superintelligent "Friendly AI" , designed to safeguard humanity and its values. I will argue that, from a physics perspective where everything is simply an arrangement of elementary particles, this might be even harder than it appears. Indeed, it may require thinking rigorously about the meaning of life: What is "meaning" in a particle arrangement? What is "life"? What is the ultimate ethical imperative, i.e., how should we strive to rearrange the particles of our Universe and shape its future? If we fail to answer the last question rigorously, this future is unlikely to contain humans.