AITopics | Welling, Max

Collaborating Authors

Welling, Max

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Involutive MCMC: a Unifying Framework

Neklyudov, Kirill, Welling, Max, Egorov, Evgenii, Vetrov, Dmitry

arXiv.org Machine LearningJun-30-2020

Name & Citation Appendix Metropolis-Hastings (Hastings, 1970) B.1 Markov Chain Monte Carlo (MCMC) is a computational Mixture Proposal (Habib & Barber, 2018) B.2 approach to fundamental problems such Multiple-Try Metropolis (Liu et al., 2000) B.3 as inference, integration, optimization, and simulation. Sample-Adaptive MCMC (Zhu, 2019) B.4 The field has developed a broad spectrum Reversible-Jump MCMC (Green, 1995) B.5 of algorithms, varying in the way they are motivated, Hybrid Monte Carlo (Duane et al., 1987) B.6 the way they are applied and how efficiently RMHMC (Girolami & Calderhead, 2011) B.7 they sample. Despite all the differences, many of NeuTra (Hoffman et al., 2019) B.8 them share the same core principle, which we A-NICE-MC (Song et al., 2017) B.9 unify as the Involutive MCMC (iMCMC) framework. L2HMC (Levy et al., 2017) B.10 Building upon this, we describe a wide Persistent HMC (Horowitz, 1991) B.11 range of MCMC algorithms in terms of iMCMC, Gibbs (Geman & Geman, 1984) B.12 and formulate a number of "tricks" which one Look Ahead (Sohl-Dickstein et al., 2014) B.13 can use as design principles for developing new NRJ (Gagnon & Doucet, 2019) B.14 MCMC algorithms. Thus, iMCMC provides a Lifted MH (Turitsyn et al., 2011) B.15 unified view of many known MCMC algorithms, which facilitates the derivation of powerful extensions. Table 1: List of algorithms that we describe by the Involutive We demonstrate the latter with two MCMC framework. See their descriptions and formulations examples where we transform known reversible in terms of iMCMC in corresponding appendices.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2006.16653

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

van der Pol, Elise, Worrall, Daniel E., van Hoof, Herke, Oliehoek, Frans A., Welling, Max

arXiv.org Machine LearningJun-30-2020

This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations). Additionally, we introduce an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done. We construct MDP homomorphic MLPs and CNNs that are equivariant under either a group of reflections or rotations. We show that such networks converge faster than unstructured baselines on CartPole, a grid world and Pong.

deep learning, neural network, symmetry, (16 more...)

arXiv.org Machine Learning

2006.16908

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

RE-MIMO: Recurrent and Permutation Equivariant Neural MIMO Detection

Pratik, Kumar, Rao, Bhaskar D., Welling, Max

arXiv.org Machine LearningJun-30-2020

In this paper, we present a novel neural network for MIMO symbol detection. It is motivated by several important considerations in wireless communication systems; permutation equivariance and a variable number of users. The neural detector learns an iterative decoding algorithm that is implemented as a stack of iterative units. Each iterative unit is a neural computation module comprising of 3 sub-modules: the likelihood module, the encoder module, and the predictor module. The likelihood module injects information about the generative (forward) process into the neural network. The encoder-predictor modules together update the state vector and symbol estimates. The encoder module updates the state vector and employs a transformer based attention network to handle the interactions among the users in a permutation equivariant manner. The predictor module refines the symbol estimates. The modular and permutation equivariant architecture allows for dealing with a varying number of users. The resulting neural detector architecture is unique and exhibits several desirable properties unseen in any of the previously proposed neural detectors. We compare its performance against existing methods and the results show the ability of our network to efficiently handle a variable number of transmitters with high accuracy.

deep learning, neural network, re-mimo, (21 more...)

arXiv.org Machine Learning

2007.0014

Country:

Europe (0.28)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Telecommunications (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks

Fuchs, Fabian B., Worrall, Daniel E., Fischer, Volker, Welling, Max

arXiv.org Machine LearningJun-22-2020

We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds, which is equivariant under continuous 3D roto-translations. Equivariance is important to ensure stable and predictable performance in the presence of nuisance transformations of the data input. A positive corollary of equivariance is increased weight-tying within the model, leading to fewer trainable parameters and thus decreased sample complexity (i.e. we need less training data). The SE(3)-Transformer leverages the benefits of self-attention to operate on large point clouds with varying number of points, while guaranteeing SE(3)-equivariance for robustness. We evaluate our model on a toy $N$-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input. We further achieve competitive performance on two real-world datasets, ScanObjectNN and QM9. In all cases, our model outperforms a strong, non-equivariant attention baseline and an equivariant model without attention.

deep learning, neural network, point cloud, (13 more...)

arXiv.org Machine Learning

2006.10503

Country:

Europe (0.14)
North America > Canada (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data

Löwe, Sindy, Madras, David, Zemel, Richard, Welling, Max

arXiv.org Machine LearningJun-18-2020

Standard causal discovery methods must fit a new model whenever they encounter samples from a new underlying causal graph. However, these samples often share relevant information - for instance, the dynamics describing the effects of causal relations - which is lost when following this approach. We propose Amortized Causal Discovery, a novel framework that leverages such shared dynamics to learn to infer causal relations from time-series data. This enables us to train a single, amortized model that infers causal relations across samples with different underlying causal graphs, and thus makes use of the information that is shared. We demonstrate experimentally that this approach, implemented as a variational model, leads to significant improvements in causal discovery performance, and show how it can be extended to perform well under hidden confounding.

causal relation, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

2006.10833

Country: North America > Canada > Ontario > Toronto (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Bayesian Bits: Unifying Quantization and Pruning

van Baalen, Mart, Louizos, Christos, Nagel, Markus, Amjad, Rana Ali, Wang, Ying, Blankevoort, Tijmen, Welling, Max

arXiv.org Machine LearningMay-15-2020

We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization. Bayesian Bits employs a novel decomposition of the quantization operation, which sequentially considers doubling the bit width. At each new bit width, the residual error between the full precision value and the previously rounded value is quantized. We then decide whether or not to add this quantized residual error for a higher effective bit width and lower quantization noise. By starting with a power-of-two bit width, this decomposition will always produce hardware-friendly configurations, and through an additional 0-bit option, serves as a unified view of pruning and quantization. Bayesian Bits then introduces learnable stochastic gates, which collectively control the bit width of the given tensor. As a result, we can obtain low bit solutions by performing approximate inference over the gates, with prior distributions that encourage most of them to be switched off. We further show that, under some assumptions, L0 regularization of the network parameters corresponds to a specific instance of the aforementioned framework. We experimentally validate our proposed method on several benchmark datasets and show that we can learn pruned, mixed precision networks that provide a better trade-off between accuracy and efficiency than their static bit width equivalents.

deep learning, neural network, quantization, (19 more...)

arXiv.org Machine Learning

2005.07093

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)

Add feedback

Variational Bayes In Private Settings (VIPS)

Park, Mijung (Max Planck Institute for Intelligent Systems) | Foulds, James | Chaudhuri, Kamalika | Welling, Max

Journal of Artificial Intelligence ResearchMay-5-2020

Many applications of Bayesian data analysis involve sensitive information such as personal documents or medical records, motivating methods which ensure that privacy is protected. We introduce a general privacy-preserving framework for Variational Bayes (VB), a widely used optimization-based Bayesian inference method. Our framework respects differential privacy, the gold-standard privacy criterion, and encompasses a large class of probabilistic models, called the Conjugate Exponential (CE) family. We observe that we can straightforwardly privatise VB's approximate posterior distributions for models in the CE family, by perturbing the expected sufficient statistics of the complete-data likelihood. For a broadly-used class of non-CE models, those with binomial likelihoods, we show how to bring such models into the CE family, such that inferences in the modified model resemble the private variational Bayes algorithm as closely as possible, using the Pólya-Gamma data augmentation scheme. The iterative nature of variational Bayes presents a further challenge since iterations increase the amount of noise needed. We overcome this by combining: (1) an improved composition method for differential privacy, called the moments accountant, which provides a tight bound on the privacy cost of multiple VB iterations and thus significantly decreases the amount of additive noise; and (2) the privacy amplification effect of subsampling mini-batches from large-scale data in stochastic learning. We empirically demonstrate the effectiveness of our method in CE and non-CE models including latent Dirichlet allocation, Bayesian logistic regression, and sigmoid belief networks, evaluated on real-world datasets.

data mining, machine learning, natural language, (20 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11763

AI Access Foundation

11763

Journal of Artificial Intelligence Research

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Maryland > Baltimore (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Neural Enhanced Belief Propagation on Factor Graphs

Satorras, Victor Garcia, Welling, Max

arXiv.org Machine LearningMar-4-2020

A graphical model is a structured representation of locally dependent random variables. A traditional method to reason over these random variables is to perform inference using belief propagation. When provided with the true data generating process, belief propagation can infer the optimal posterior probability estimates in tree structured factor graphs. However, in many cases we may only have access to a poor approximation of the data generating process, or we may face loops in the factor graph, leading to suboptimal estimates. In this work we first extend graph neural networks to factor graphs (FG-GNN). We then propose a new hybrid model that runs conjointly a FG-GNN with belief propagation. The FG-GNN receives as input messages from belief propagation at every inference iteration and outputs a corrected version of them. As a result, we obtain a more accurate algorithm that combines the benefits of both belief propagation and graph neural networks. We apply our ideas to error correction decoding tasks, and we show that our algorithm can outperform belief propagation for LDPC codes on bursty channels.

belief propagation, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

2003.01998

Country: Europe (0.46)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On Herding and the Perceptron Cycling Theorem

Gelfand, Andrew, Chen, Yutian, Maaten, Laurens, Welling, Max

Neural Information Processing SystemsFeb-15-2020, 00:57:25 GMT

The paper develops a connection between traditional perceptron algorithms and recently introduced herding algorithms. It is shown that both algorithms can be viewed as an application of the perceptron cycling theorem. This connection strengthens some herding results and suggests new (supervised) herding algorithms that, like CRFs or discriminative RBMs, make predictions by conditioning on the input attributes. We develop and investigate variants of conditional herding, and show that conditional herding leads to practical algorithms that perform better than or on par with related classifiers such as the voted perceptron and the discriminative RBM. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, neural network, perceptron cycling theorem, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

Graphical Generative Adversarial Networks

LI, Chongxuan, Welling, Max, Zhu, Jun, Zhang, Bo

Neural Information Processing SystemsFeb-14-2020, 17:56:15 GMT

We propose Graphical Generative Adversarial Networks (Graphical-GAN) to model structured data. We introduce a structured recognition model to infer the posterior distribution of latent variables given observations. We generalize the Expectation Propagation (EP) algorithm to learn the generative model and recognition model jointly. Gaussian Mixture GAN (GMGAN) and State Space GAN (SSGAN), which can successfully learn the discrete and temporal structures on visual datasets, respectively. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, generative adversarial network, neural network, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.76)

Add feedback