AITopics | Anandkumar, Anima

Plotting

Anandkumar, Anima

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning compositional functions via multiplicative weight updates

Bernstein, Jeremy, Zhao, Jiawei, Meister, Markus, Liu, Ming-Yu, Anandkumar, Anima, Yue, Yisong

arXiv.org Machine LearningJun-25-2020

Compositionality is a basic structural feature of both biological and artificial neural networks. Learning compositional functions via gradient descent incurs well known problems like vanishing and exploding gradients, making careful learning rate tuning essential for real-world applications. This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions. Based on this lemma, we derive Madam---a multiplicative version of the Adam optimiser---and show that it can train state of the art neural network architectures without learning rate tuning. We further show that Madam is easily adapted to train natively compressed neural networks by representing their weights in a logarithmic number system. We conclude by drawing connections between multiplicative weight updates and recent findings about synapses in biology.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

2006.1456

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Competitive Policy Optimization

Prajapat, Manish, Azizzadenesheli, Kamyar, Liniger, Alexander, Yue, Yisong, Anandkumar, Anima

arXiv.org Machine LearningJun-18-2020

A core challenge in policy optimization in competitive Markov decision processes is the design of efficient optimization methods with desirable convergence and stability properties. To tackle this, we propose competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates. Motivated by the competitive gradient optimization method, we derive a bilinear approximation of the game objective. In contrast, off-the-shelf policy gradient methods utilize only linear approximations, and hence do not capture interactions among the players. We instantiate CoPO in two ways:(i) competitive policy gradient, and (ii) trust-region competitive policy optimization. We theoretically study these methods, and empirically investigate their behavior on a set of comprehensive, yet challenging, competitive games. We observe that they provide stable optimization, convergence to sophisticated strategies, and higher scores when played against baseline policy gradient methods.

approximation, game theory, neural network, (19 more...)

arXiv.org Machine Learning

2006.10611

Country:

Europe (0.67)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games (1.00)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Add feedback

Competitive Gradient Descent

Schaefer, Florian, Anandkumar, Anima

Neural Information Processing SystemsMar-18-2020, 23:33:42 GMT

We introduce a new algorithm for the numerical computation of Nash equilibria of competitive two-player games. Our method is a natural generalization of gradient descent to the two-player setting where the update is given by the Nash equilibrium of a regularized bilinear local approximation of the underlying game. It avoids oscillatory and divergent behaviors seen in alternating gradient descent. Using numerical experiments and rigorous analysis, we provide a detailed comparison to methods based on \emph{optimism} and \emph{consensus} and show that our method avoids making any unnecessary changes to the gradient dynamics while achieving exponential (local) convergence for (locally) convex-concave zero sum games. Convergence and stability properties of our method are robust to strong interactions between the players, without adapting the stepsize, which is not the case with previous methods.

artificial intelligence, game theory, gradient descent, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Add feedback

Neural Operator: Graph Kernel Network for Partial Differential Equations

Li, Zongyi, Kovachki, Nikola, Azizzadenesheli, Kamyar, Liu, Burigede, Bhattacharya, Kaushik, Stuart, Andrew, Anandkumar, Anima

arXiv.org Machine LearningMar-6-2020

The classical development of neural networks has been primarily for mappings between a finite-dimensional Euclidean space and a set of classes, or between two finite-dimensional Euclidean spaces. The purpose of this work is to generalize neural networks so that they can learn mappings between infinite-dimensional spaces (operators). The key innovation in our work is that a single set of network parameters, within a carefully designed network architecture, may be used to describe mappings between infinite-dimensional spaces and between different finite-dimensional approximations of those spaces. We formulate approximation of the infinite-dimensional mapping by composing nonlinear activation functions and a class of integral operators. The kernel integration is computed by message passing on graph networks. This approach has substantial practical consequences which we will illustrate in the context of mappings between input data to partial differential equations (PDEs) and their solutions. In this context, such learned networks can generalize among different approximation methods for the PDE (such as finite difference or finite element methods) and among approximations corresponding to different underlying levels of resolution and discretization. Experiments confirm that the proposed graph kernel network does have the desired properties and show competitive performance compared to the state of the art solvers.

approximation, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

2003.03485

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

Anandkumar, Anima, Hsu, Daniel J., Janzamin, Majid, Kakade, Sham M.

Neural Information Processing SystemsFeb-14-2020, 17:57:25 GMT

Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish {\em generic} identifiability under a constraint, referred to as {\em topic persistence}. Our sufficient conditions for identifiability involve a novel set of higher order'' expansion conditions on the {\em topic-word matrix} or the {\em population structure} of the model.

artificial intelligence, decomposition, natural language, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)

Add feedback

Implicit competitive regularization in GANs

Schäfer, Florian, Zheng, Hongkai, Anandkumar, Anima

arXiv.org Machine LearningOct-13-2019

Generative adversarial networks (GANs) are capable of producing high quality samples, but they suffer from numerous issues such as instability and mode collapse during training. To combat this, we propose to model the generator and discriminator as agents acting under local information, uncertainty, and awareness of their opponent. By doing so we achieve stable convergence, even when the underlying game has no Nash equilibria. We call this mechanism implicit competitive regularization (ICR) and show that it is present in the recently proposed competitive gradient descent (CGD). When comparing CGD to Adam using a variety of loss functions and regularizers on CIFAR10, CGD shows a much more consistent performance, which we attribute to ICR. In our experiments, we achieve the highest inception score when using the WGAN loss (without gradient penalty or weight clipping) together with CGD. This can be interpreted as minimizing a form of integral probability metric based on ICR. Generative adversarial networks (GANs): (Goodfellow et al., 2014) are a class of generative models based on a competitive game between a generator that tries to generate realistic new data, and a discriminator, that tries to distinguish generated from real data.

artificial intelligence, discriminator, neural network, (16 more...)

arXiv.org Machine Learning

1910.05852

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Out-of-Distribution Detection Using Neural Rendering Generative Models

Huang, Yujia, Dai, Sihui, Nguyen, Tan, Baraniuk, Richard G., Anandkumar, Anima

arXiv.org Machine LearningJul-10-2019

Out-of-distribution (OoD) detection is a natural downstream task for deep generative models, due to their ability to learn the input probability distribution. There are mainly two classes of approaches for OoD detection using deep generative models, viz., based on likelihood measure and the reconstruction loss. However, both approaches are unable to carry out OoD detection effectively, especially when the OoD samples have smaller variance than the training samples. For instance, both flow based and VAE models assign higher likelihood to images from SVHN when trained on CIFAR-10 images. We use a recently proposed generative model known as neural rendering model (NRM) and derive metrics for OoD. We show that NRM unifies both approaches since it provides a likelihood estimate and also carries out reconstruction in each layer of the neural network. Among various measures, we found the joint likelihood of latent variables to be the most effective one for OoD detection. Our results show that when trained on CIFAR-10, lower likelihood (of latent variables) is assigned to SVHN images. Additionally, we show that this metric is consistent across other OoD datasets. To the best of our knowledge, this is the first work to show consistently lower likelihood for OoD data with smaller variance with deep generative models.

deep learning, latent variable, neural network, (20 more...)

arXiv.org Machine Learning

1907.04572

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)

Add feedback

Learning Causal State Representations of Partially Observable Environments

Zhang, Amy, Lipton, Zachary C., Pineda, Luis, Azizzadenesheli, Kamyar, Anandkumar, Anima, Itti, Laurent, Pineau, Joelle, Furlanello, Tommaso

arXiv.org Machine LearningJun-25-2019

Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstractions. In this paper, we propose mechanisms to approximate causal states, which optimally compress the joint history of actions and observations in partially-observable Markov decision processes. Our proposed algorithm extracts causal state representations from RNNs that are trained to predict subsequent observations given the history. We demonstrate that these learned task-agnostic state abstractions can be used to efficiently learn policies for reinforcement learning problems with rich observation spaces. We evaluate agents using multiple partially observable navigation tasks with both discrete (GridWorld) and continuous (VizDoom, ALE) observation processes that cannot be solved by traditional memory-limited methods. Our experiments demonstrate systematic improvement of the DQN and tabular models using approximate causal state representations with respect to recurrent-DQN baselines trained with raw inputs.

neural network, representation, us government, (19 more...)

arXiv.org Machine Learning

1906.10437

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry:

Education (0.66)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Robust Regression for Safe Exploration in Control

Liu, Anqi, Shi, Guanya, Chung, Soon-Jo, Anandkumar, Anima, Yue, Yisong

arXiv.org Machine LearningJun-13-2019

We study the problem of safe learning and exploration in sequential control problems. The goal is to safely collect data samples from an operating environment to learn an optimal controller. A central challenge in this setting is how to quantify uncertainty in order to choose provably-safe actions that allow us to collect useful data and reduce uncertainty, thereby achieving both improved safety and optimality. To address this challenge, we present a deep robust regression model that is trained to directly predict the uncertainty bounds for safe exploration. We then show how to integrate our robust regression approach with model-based control methods by learning a dynamic model with robustness bounds. We derive generalization bounds under domain shifts for learning and connect them with safety and stability bounds in control. We demonstrate empirically that our robust regression approach can outperform conventional Gaussian process (GP) based safe exploration in settings where it is difficult to specify a good GP prior.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Machine Learning

1906.05819

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Stochastically Rank-Regularized Tensor Regression Networks

Kolbeinsson, Arinbjörn, Kossaifi, Jean, Panagakis, Yannis, Anandkumar, Anima, Tzoulaki, Ioanna, Matthews, Paul

arXiv.org Machine LearningFeb-27-2019

Over-parametrization of deep neural networks has recently been shown to be key to their successful training. However, it also renders them prone to overfitting and makes them expensive to store and train. Tensor regression networks significantly reduce the number of effective parameters in deep neural networks while retaining accuracy and the ease of training. They replace the flattening and fully-connected layers with a tensor regression layer, where the regression weights are expressed through the factors of a low-rank tensor decomposition. In this paper, to further improve tensor regression networks, we propose a novel stochastic rank-regularization. It consists of a novel randomized tensor sketching method to approximate the weights of tensor regression layers. We theoretically and empirically establish the link between our proposed stochastic rank-regularization and the dropout on low-rank tensor regression. Extensive experimental results with both synthetic data and real world datasets (i.e., CIFAR-100 and the UK Biobank brain MRI dataset) support that the proposed approach i) improves performance in both classification and regression tasks, ii) decreases overfitting, iii) leads to more stable training and iv) improves robustness to adversarial attacks and random noise.

deep learning, neural network, tensor, (18 more...)

arXiv.org Machine Learning

1902.10758

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback