AITopics | Tegmark, Max

Collaborating Authors

Tegmark, Max

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Pareto-optimal data compression for binary classification tasks

Tegmark, Max, Wu, Tailin

arXiv.org Machine LearningAug-23-2019

The goal of lossy data compression is to reduce the storage cost of a data set $X$ while retaining as much information as possible about something ($Y$) that you care about. For example, what aspects of an image $X$ contain the most information about whether it depicts a cat? Mathematically, this corresponds to finding a mapping $X\to Z\equiv f(X)$ that maximizes the mutual information $I(Z,Y)$ while the entropy $H(Z)$ is kept below some fixed threshold. We present a method for mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retained entropy and class information. We first show how a random variable $X$ (an image, say) drawn from a class $Y\in\{1,...,n\}$ can be distilled into a vector $W=f(X)\in \mathbb{R}^{n-1}$ losslessly, so that $I(W,Y)=I(X,Y)$; for example, for a binary classification task of cats and dogs, each image $X$ is mapped into a single real number $W$ retaining all information that helps distinguish cats from dogs. For the $n=2$ case of binary classification, we then show how $W$ can be further compressed into a discrete variable $Z=g_\beta(W)\in\{1,...,m_\beta\}$ by binning $W$ into $m_\beta$ bins, in such a way that varying the parameter $\beta$ sweeps out the full Pareto frontier, solving a generalization of the Discrete Information Bottleneck (DIB) problem. We argue that the most interesting points on this frontier are "corners" maximizing $I(Z,Y)$ for a fixed number of bins $m=2,3...$ which can be conveniently be found without multiobjective optimization. We apply this method to the CIFAR-10, MNIST and Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoretically optimal image clustering algorithm.

information, neural network, optimization problem, (19 more...)

arXiv.org Machine Learning

1908.08961

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

Add feedback

Learnability for the Information Bottleneck

Wu, Tailin, Fischer, Ian, Chuang, Isaac L., Tegmark, Max

arXiv.org Machine LearningJul-17-2019

The Information Bottleneck (IB) method (\cite{tishby2000information}) provides an insightful and principled approach for balancing compression and prediction for representation learning. The IB objective $I(X;Z)-\beta I(Y;Z)$ employs a Lagrange multiplier $\beta$ to tune this trade-off. However, in practice, not only is $\beta$ chosen empirically without theoretical guidance, there is also a lack of theoretical understanding between $\beta$, learnability, the intrinsic nature of the dataset and model capacity. In this paper, we show that if $\beta$ is improperly chosen, learning cannot happen -- the trivial representation $P(Z|X)=P(Z)$ becomes the global minimum of the IB objective. We show how this can be avoided, by identifying a sharp phase transition between the unlearnable and the learnable which arises as $\beta$ is varied. This phase transition defines the concept of IB-Learnability. We prove several sufficient conditions for IB-Learnability, which provides theoretical guidance for choosing a good $\beta$. We further show that IB-learnability is determined by the largest confident, typical, and imbalanced subset of the examples (the conspicuous subset), and discuss its relation with model capacity. We give practical algorithms to estimate the minimum $\beta$ for a given dataset. We also empirically demonstrate our theoretical conditions with analyses of synthetic datasets, MNIST, and CIFAR10.

artificial intelligence, conference paper, neural network, (18 more...)

arXiv.org Machine Learning

1907.07331

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

AI Feynman: a Physics-Inspired Method for Symbolic Regression

Udrescu, Silviu-Marian, Tegmark, Max

arXiv.org Artificial IntelligenceMay-27-2019

A core challenge for both physics and artificial intellicence (AI) is symbolic regression: finding a symbolic expression that matches data from an unknown function. Although this problem is likely to be NP-hard in principle, functions of practical interest often exhibit symmetries, separability, compositionality and other simplifying properties. In this spirit, we develop a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques. We apply it to 100 equations from the Feynman Lectures on Physics, and it discovers all of them, while previous publicly available software cracks only 71; for a more difficult test set, we improve the state of the art success rate from 15% to 90%.

artificial intelligence, bf 10, neural network, (18 more...)

arXiv.org Artificial Intelligence

1905.11481

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)

Add feedback

The role of artificial intelligence in achieving the Sustainable Development Goals

Vinuesa, Ricardo, Azizpour, Hossein, Leite, Iolanda, Balaam, Madeline, Dignum, Virginia, Domisch, Sami, Felländer, Anna, Langhans, Simone, Tegmark, Max, Nerini, Francesco Fuso

arXiv.org Artificial IntelligenceApr-30-2019

The emergence of artificial intelligence (AI) and its progressively wider impact on many sectors across the society requires an assessment of its effect on sustainable development. Here we analyze published evidence of positive or negative impacts of AI on the achievement of each of the 17 goals and 169 targets of the 2030 Agenda for Sustainable Development. We find that AI can support the achievement of 128 targets across all SDGs, but it may also inhibit 58 targets. Notably, AI enables new technologies that improve efficiency and productivity, but it may also lead to increased inequalities among and within countries, thus hindering the achievement of the 2030 Agenda. The fast development of AI needs to be supported by appropriate policy and regulation. Otherwise, it would lead to gaps in transparency, accountability, safety and ethical standards of AI-based technology, which could be detrimental towards the development and sustainable use of AI. Finally, there is a lack of research assessing the medium- and long-term impacts of AI. It is therefore essential to reinforce the global debate regarding the use of AI and to develop the necessary regulatory insight and oversight for AI-based technologies.

energy conservation, sdg, survey article, (19 more...)

arXiv.org Artificial Intelligence

1905.00501

Country:

Europe (1.00)
North America > United States > Massachusetts (0.14)

Genre: Research Report > Experimental Study (0.68)

Industry:

Banking & Finance (0.94)
Information Technology (0.93)
Government (0.93)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Applied AI (1.00)

Add feedback

Latent Representations of Dynamical Systems: When Two is Better Than One

Tegmark, Max

arXiv.org Machine LearningFeb-20-2019

A popular approach for predicting the future of dynamical systems involves mapping them into a lower-dimensional "latent space" where prediction is easier. We show that the information-theoretically optimal approach uses different mappings for present and future, in contrast to state-of-the-art machine-learning approaches where both mappings are the same. We illustrate this dichotomy by predicting the time-evolution of coupled harmonic oscillators with dissipation and thermal noise, showing how the optimal 2-mapping method significantly outperforms principal component analysis and all other approaches that use a single latent representation, and discuss the intuitive reason why two representations are better than one. We conjecture that a single latent representation is optimal only for time-reversible processes, not for e.g. text, speech, music or out-of-equilibrium physical systems.

artificial intelligence, information, machine learning, (19 more...)

arXiv.org Machine Learning

1902.03364

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Gated Orthogonal Recurrent Units: On Learning to Forget

Jing, Li (Massachusetts Institute of Technology) | Gulcehre, Caglar (MILA - Universite de Montreal) | Peurifoy, John (Massachusetts Institute of Technology) | Shen, Yichen (Massachusetts Institute of Technology) | Tegmark, Max (Massachusetts Institute of Technology) | Soljacic, Marin (Massachusetts Institute of Technology) | Bengio, Yoshua (MILA - Universite de Montreal)

AAAI ConferencesApr-6-2018

We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory. We achieve this by extending unitary RNNs with a gating mechanism. Our model is able to outperform LSTMs, GRUs and Unitary RNNs on several long-term dependency benchmark tasks. We empirically both show the orthogonal/unitary RNNs lack the ability to forget and also the ability of GORU to simultaneously remember long term dependencies while forgetting irrelevant information. This plays an important role in recurrent neural networks. We provide competitive results along with an analysis of our model on many natural sequential tasks including the bAbI Question Answering, TIMIT speech spectrum prediction, Penn TreeBank, and synthetic tasks that involve long-term dependencies such as algorithmic, parenthesis, denoising and copying tasks.

gated orthogonal recurrent unit, learning

AAAI Conferences

Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Gated Orthogonal Recurrent Units: On Learning to Forget

Jing, Li, Gulcehre, Caglar, Peurifoy, John, Shen, Yichen, Tegmark, Max, Soljačić, Marin, Bengio, Yoshua

arXiv.org Machine LearningOct-25-2017

deep learning, neural network, rnn, (17 more...)

arXiv.org Machine Learning

1706.02761

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Why does deep and cheap learning work so well?

Lin, Henry W., Tegmark, Max, Rolnick, David

arXiv.org Machine LearningAug-3-2017

We show how the success of deep learning could depend not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can frequently be approximated through "cheap learning" with exponentially fewer parameters than generic ones. We explore how properties frequently encountered in physics such as symmetry, locality, compositionality, and polynomial log-probability translate into exceptionally simple neural networks. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to the renormalization group. We prove various "no-flattening theorems" showing when efficient linear deep networks cannot be accurately approximated by shallow ones without efficiency loss, for example, we show that $n$ variables cannot be multiplied using fewer than 2^n neurons in a single hidden layer.

artificial intelligence, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

doi: 10.1007/s10955-017-1836-5

1608.08225

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

The power of deeper networks for expressing natural functions

Rolnick, David, Tegmark, Max

arXiv.org Machine LearningMay-15-2017

It is well-known that neural networks are universal approximators, but that deeper networks tend to be much more efficient than shallow ones. We shed light on this by proving that the total number of neurons $m$ required to approximate natural classes of multivariate polynomials of $n$ variables grows only linearly with $n$ for deep neural networks, but grows exponentially when merely a single hidden layer is allowed. We also provide evidence that when the number of hidden layers is increased from $1$ to $k$, the neuron requirement grows exponentially not with $n$ but with $n^{1/k}$, suggesting that the minimum number of layers required for computational tractability grows only logarithmically with $n$.

deep learning, neural network, neuron, (19 more...)

arXiv.org Machine Learning

1705.05502

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs

Jing, Li, Shen, Yichen, Dubček, Tena, Peurifoy, John, Skirlo, Scott, LeCun, Yann, Tegmark, Max, Soljačić, Marin

arXiv.org Machine LearningApr-3-2017

Using unitary (instead of general) matrices in artificial neural networks (ANNs) is a promising way to solve the gradient explosion/vanishing problem, as well as to enable ANNs to learn long-term correlations in the data. This approach appears particularly promising for Recurrent Neural Networks (RNNs). In this work, we present a new architecture for implementing an Efficient Unitary Neural Network (EUNNs); its main advantages can be summarized as follows. Firstly, the representation capacity of the unitary space in an EUNN is fully tunable, ranging from a subspace of SU(N) to the entire unitary space. Secondly, the computational complexity for training an EUNN is merely $\mathcal{O}(1)$ per parameter. Finally, we test the performance of EUNNs on the standard copying task, the pixel-permuted MNIST digit recognition benchmark as well as the Speech Prediction Test (TIMIT). We find that our architecture significantly outperforms both other state-of-the-art unitary RNNs and the LSTM architecture, in terms of the final performance and/or the wall-clock training speed. EUNNs are thus promising alternatives to RNNs and LSTMs for a wide variety of applications.

deep learning, matrix, neural network, (16 more...)

arXiv.org Machine Learning

1612.05231

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.50)

Industry: Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback