AITopics

2002.03375

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California (0.04)
North America > United States > Arizona (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

Mikkola, Petrus, Todorović, Milica, Järvi, Jari, Rinke, Patrick, Kaski, Samuel

Projective Preferential Bayesian Optimization

arXiv.org Machine LearningFeb-8-2020

Bayesian optimization is an effective method for finding extrema of a black-box function. We propose a new type of Bayesian optimization for learning user preferences in high-dimensional spaces. The central assumption is that the underlying objective function cannot be evaluated directly, but instead a minimizer along a projection can be queried, which we call a projective preferential query. The form of the query allows for feedback that is natural for a human to give, and which enables interaction. This is demonstrated in a user experiment in which the user feedback comes in the form of optimal position and orientation of a molecule adsorbing to a surface. We demonstrate that our framework is able to find a global minimum of a high-dimensional black-box function, which is an infeasible task for existing preferential Bayesian optimization frameworks that are based on pairwise comparisons.

optimization, preferential query, query, (14 more...)

2002.03113

Country:

North America > Canada > British Columbia (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report (0.50)

Industry:

Transportation (0.55)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Dimitrakakis, Christos, Eriksson, Hannes, Jorge, Emilio, Grover, Divya, Basu, Debabrota

Inferential Induction: Joint Bayesian Estimation of MDPs and Value Functions

arXiv.org Machine LearningFeb-8-2020

Bayesian reinforcement learning (BRL) offers a decision-theoretic solution to the problem of reinforcement learning. However, typical model-based BRL algorithms have focused either on ma intaining a posterior distribution on models or value functions and combining this with approx imate dynamic programming or tree search. This paper describes a novel backwards induction pri nciple for performing joint Bayesian estimation of models and value functions, from which many new BRL algorithms can be obtained. We demonstrate this idea with algorithms and experiments in discrete state spaces.

algorithm, value function, value function distribution, (14 more...)

2002.03098

Country:

North America > United States > New Jersey (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

#artificialintelligenceFeb-7-2020, 18:52:00 GMT

Overcoming Mode Collapse and the Curse of Dimensionality

Machine Learning Lecture at CMU by Ke Li, Ph.D. Candidate at the University of California, Berkeley Lecturer: Ke Li Carnegie Mellon University Abstract: In this talk, Li presents his team's work on overcoming two long-standing problems in machine learning and algorithms: 1. Mode collapse in generative adversarial nets (GANs) Generative adversarial nets (GANs) are perhaps the most popular class of generative models in use today. Unfortunately, they suffer from the well-documented problem of mode collapse, which the many successive variants of GANs have failed to overcome. I will illustrate why mode collapse happens fundamentally and show a simple way to overcome it, which is the basis of a new method known as Implicit Maximum Likelihood Estimation (IMLE). It turns out that this problem is not insurmountable - I will explain how the curse of dimensionality arises and show a simple way to overcome it, which gives rise to a new family of algorithms known as Dynamic Continuous Indexing (DCI). Bio: Ke Li is a recent Ph.D. graduate from UC Berkeley, where he was advised by Prof. Jitendra Malik, and will join Google as a Research Scientist and the Institute for Advanced Study (IAS) as a Member hosted by Prof. Sanjeev Arora.

algorithm, dimensionality, overcoming mode collapse, (5 more...)

#artificialintelligence

Country:

North America > United States > California > Alameda County > Berkeley (0.27)
North America > Canada > Ontario > Toronto (0.20)

Industry: Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.60)

arXiv.org Machine LearningFeb-7-2020

Extended Stochastic Gradient MCMC for Large-Scale Bayesian Variable Selection

Song, Qifan, Sun, Yan, Ye, Mao, Liang, Faming

Stochastic gradient Markov chain Monte Carlo (MCMC) algorithms have received much attention in Bayesian computing for big data problems, but they are only applicable to a small class of problems for which the parameter space has a fixed dimension and the log-posterior density is differentiable with respect to the parameters. This paper proposes an extended stochastic gradient MCMC lgoriathm which, by introducing appropriate latent variables, can be applied to more general large-scale Bayesian computing problems, such as those involving dimension jumping and missing data. Numerical studies show that the proposed algorithm is highly scalable and much more efficient than traditional MCMC algorithms. The proposed algorithms have much alleviated the pain of Bayesian methods in big data computing.

algorithm, iteration, stochastic gradient langevin, (10 more...)

2002.02919

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Europe > United Kingdom (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

arXiv.org Machine LearningFeb-7-2020

The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

Swiatkowski, Jakub, Roth, Kevin, Veeling, Bastiaan S., Tran, Linh, Dillon, Joshua V., Mandt, Stephan, Snoek, Jasper, Salimans, Tim, Jenatton, Rodolphe, Nowozin, Sebastian

Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work developing this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding that suggests instead restricting the variational distribution to a more compact parameterization. For a variety of deep Bayesian neural networks trained using Gaussian mean-field variational inference, we find that the posterior standard deviations consistently exhibit strong low-rank structure after convergence. This means that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance. Furthermore, we find that such factorized parameterizations improve the signal-to-noise ratio of stochastic gradient estimates of the variational lower bound, resulting in faster convergence.

approximation, matrix, posterior, (13 more...)

2002.02655

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Courts, Jarrad, Renton, Christopher, Schön, Thomas B., Wills, Adrian

Constructing a variational family for nonlinear state-space models

Mathematical models of system dynamics are a core technology in most model-based engineered systems acting and interacting with their environment. Examples include GPS, autonomous vehicles, passenger aircraft and robotics, to name just a few. The remarkable utility of mathematical models stems from the fact that, inter alia, they enable decision making based on prediction of system behaviour under new scenarios, accelerate the analysis and design processes, are fundamental to detecting faults or changes, and they are capable of handling uncertainty that is present in data, assumptions and algorithms. Motivated by the broad applicability and utility of modelling, the scientific community has devoted significant research attention towards learning dynamical models from data. Importantly, for dynamic systems, the sequence or ordering of the data must be maintained as future outcomes are deemed to be fundamentally related to the past. This is sometimes called sequence learning (Sun and Giles, 2001) or system identification (Ljung, 1999). In essence, these approaches search over a space of models and determine the model that best (in some sense) fits the data while maintaining the time ordering. The current paper is directed towards solving this important problem. To make these ideas more concrete, here we assume that data from the system of interest is available in the form of a data record y 1:T {y 1,...,y T }, where each measurementy k is potentially multidimensional and the number of available measurements is denoted as T 0. We further assume that the data may be adequately described as an instance from a joint distribution that is parametrized by an unknown vectorθ (called the parameter vector), that is (with abuse of notation)

approximation, iteration, state distribution, (16 more...)

2002.0262

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia (0.04)
North America > United States > New York (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (0.54)
Transportation > Air (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Robots (0.86)

Welleck, Sean, Kulikov, Ilia, Kim, Jaedeok, Pang, Richard Yuanzhe, Cho, Kyunghyun

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under the model. We prove that commonly used incomplete decoding algorithms - greedy search, beam search, top-k sampling, and nucleus sampling - are inconsistent, despite the fact that recurrent language models are trained to produce sequences of finite length. Based on these insights, we propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model. Empirical results show that inconsistency occurs in practice, and that the proposed methods prevent inconsistency.

algorithm, language model, sequence, (13 more...)

2002.02492

Country:

North America > United States > Texas (0.04)
North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Product Kanerva Machines: Factorized Bayesian Memory

Marblestone, Adam, Wu, Yan, Wayne, Greg

An ideal cognitively-inspired memory system would compress and organize incoming items. The Kanerva Machine (Wu et al., 2018b;a) is a Bayesian model that naturally implements online memory compression. However, the organization of the Kanerva Machine is limited by its use of a single Gaussian random matrix for storage. Here we introduce the Product Kanerva Machine, which dynamically combines many smaller Kanerva Machines. Its hierarchical structure provides a principled way to abstract invariant features and gives scaling and capacity advantages over single Kanerva Machines. We show that it can exhibit unsupervised clustering, find sparse and combinatorial allocation patterns, and discover spatial tunings that approximately factorize simple images by object.

arxiv preprint arxiv, kanerva machine, reconstruction, (13 more...)

2002.02385

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

Macroscopic Traffic Flow Modeling with Physics Regularized Gaussian Process: A New Insight into Machine Learning Applications

Yuan, Yun, Yang, Xianfeng Terry, Zhang, Zhao, Zhe, Shandian

Despite the wide implementation of machine learning (ML) techniques in traffic flow modeling recently, those data-driven approaches often fall short of accuracy in the cases with a small or noisy dataset. To address this issue, this study presents a new modeling framework, named physics regularized machine learning (PRML), to encode classical traffic flow models (referred as physical models) into the ML architecture and to regularize the ML training process. More specifically, a stochastic physics regularized Gaussian process (PRGP) model is developed and a Bayesian inference algorithm is used to estimate the mean and kernel of the PRGP. A physical regularizer based on macroscopic traffic flow models is also developed to augment the estimation via a shadow GP and an enhanced latent force model is used to encode physical knowledge into stochastic processes. Based on the posterior regularization inference framework, an efficient stochastic optimization algorithm is also developed to maximize the evidence lowerbound of the system likelihood. To prove the effectiveness of the proposed model, this paper conducts empirical studies on a real-world dataset which is collected from a stretch of I-15 freeway, Utah. Results show the new PRGP model can outperform the previous compatible methods, such as calibrated pure physical models and pure machine learning methods, in estimation precision and input robustness.

estimation, flow model, traffic flow model, (15 more...)

2002.02374

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Minnesota (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (1.00)
Consumer Products & Services > Travel (1.00)
Transportation > Infrastructure & Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)