AITopics

2002.08853

Country:

North America > United States > New York (0.04)
Asia > China > Hong Kong > Kowloon (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Wilson, Andrew Gordon, Izmailov, Pavel

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

arXiv.org Machine LearningFeb-20-2020

The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead. We also investigate the prior over functions implied by a vague distribution over neural network weights, explaining the generalization properties of such models from a probabilistic perspective. From this perspective, we explain results that have been presented as mysterious and distinct to neural network generalization, such as the ability to fit images with random labels, and show that these results can be reproduced with Gaussian processes. Finally, we provide a Bayesian perspective on tempering for calibrating predictive distributions.

neural network, posterior, predictive distribution, (11 more...)

2002.08791

Country:

North America > United States > Ohio (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Camuto, Alexander, Willetts, Matthew, Paige, Brooks, Holmes, Chris, Roberts, Stephen

Learning Bijective Feature Maps for Linear ICA

arXiv.org Machine LearningFeb-19-2020

Separating high-dimensional data like images into independent latent factors remains an open research problem. Here we develop a method that jointly learns a linear independent component analysis (ICA) model with non-linear bijective feature maps. By combining these two methods, ICA can learn interpretable latent structure for images. For non-square ICA, where we assume the number of sources is less than the dimensionality of data, we achieve better unsupervised latent factor discovery than flow-based models and linear ICA. This performance scales to large image datasets such as CelebA.

ica, learning bijective feature map, matrix, (12 more...)

2002.07766

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Observational nonidentifiability, generalized likelihood and free energy

Allahverdyan, A. E.

We study the parameter estimation problem in mixture models with observational nonidentifiability: the full model (also containing hidden variables) is identifiable, but the marginal (observed) model is not. Hence global maxima of the marginal likelihood are (infinitely) degenerate and predictions of the marginal likelihood are not unique. We show how to generalize the marginal likelihood by introducing an effective temperature, and making it similar to the free energy. This generalization resolves the observational nonidentifiability, since its maximization leads to unique results that are better than a random selection of one degenerate maximum of the marginal likelihood or the averaging over many such maxima. The generalized likelihood inherits many features from the usual likelihood, e.g. it holds the conditionality principle, and its local maximum can be searched for via suitably modified expectation-maximization method. The maximization of the generalized likelihood relates to entropy optimization.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2002.07884

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Armenia > Yerevan > Yerevan (0.04)
North America > United States > New York > Suffolk County > Melville (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Joo, Taejong, Chung, Uijung, Seo, Min-Gwan

Being Bayesian about Categorical Probability

Neural networks utilize the softmax as a building block in classification tasks, which contains an overconfidence problem and lacks an uncertainty representation ability. As a Bayesian alternative to the softmax, we consider a random variable of a categorical probability over class labels. In this framework, the prior distribution explicitly models the presumed noise inherent in the observed label, which provides consistent gains in generalization performance in multiple challenging tasks. The proposed method inherits advantages of Bayesian approaches that achieve better uncertainty estimation and model calibration. Our method can be implemented as a plug-and-play loss function with negligible computational overhead compared to the softmax with the cross-entropy loss function.

bayesian, categorical probability, softmax, (15 more...)

2002.07965

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Jayaram, Vivek, Thickstun, John

Source Separation with Deep Generative Priors

Despite substantial progress in signal source separation, results for richly structured data continue to contain perceptible artifacts. In contrast, recent deep generative models can produce authentic samples in a variety of domains that are indistinguishable from samples of the data distribution. This paper introduces a Bayesian approach to source separation that uses generative models as priors over the components of a mixture of sources, and Langevin dynamics to sample from the posterior distribution of sources given a mixture. This decouples the source separation problem from generative modeling, enabling us to directly use cutting-edge generative models as priors. The method achieves state-of-the-art performance for MNIST digit separation. We introduce new methodology for evaluating separation quality on richer datasets, providing quantitative evaluation of separation results on CIFAR-10. We also provide qualitative results on LSUN.

separation, separation result, source separation, (13 more...)

2002.07942

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Constraining the recent star formation history of galaxies : an Approximate Bayesian Computation approach

Aufort, G., Ciesla, L., Pudlo, P., Buat, V.

[Abridged] Although galaxies are found to follow a tight relation between their star formation rate and stellar mass, they are expected to exhibit complex star formation histories (SFH), with short-term fluctuations. The goal of this pilot study is to present a method that will identify galaxies that are undergoing a strong variation of star formation activity in the last tens to hundreds Myr. In other words, the proposed method will determine whether a variation in the last few hundreds of Myr of the SFH is needed to properly model the SED rather than a smooth normal SFH. To do so, we analyze a sample of COSMOS galaxies using high signal-to-noise ratio broad band photometry. We apply Approximate Bayesian Computation, a state-of-the-art statistical method to perform model choice, associated to machine learning algorithms to provide the probability that a flexible SFH is preferred based on the observed flux density ratios of galaxies. We present the method and test it on a sample of simulated SEDs. The input information fed to the algorithm is a set of broadband UV to NIR (rest-frame) flux ratios for each galaxy. The method has an error rate of 21% in recovering the right SFH and is sensitive to SFR variations larger than 1 dex. A more traditional SED fitting method using CIGALE is tested to achieve the same goal, based on fits comparisons through Bayesian Information Criterion but the best error rate obtained is higher, 28%. We apply our new method to the COSMOS galaxies sample. The stellar mass distribution of galaxies with a strong to decisive evidence against the smooth delayed-$\tau$ SFH peaks at lower M* compared to galaxies where the smooth delayed-$\tau$ SFH is preferred. We discuss the fact that this result does not come from any bias due to our training. Finally, we argue that flexible SFHs are needed to be able to cover that largest SFR-M* parameter space possible.

catalog, galaxy, sfh, (15 more...)

2002.07815

Country: Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

A Neural Network Based on First Principles

Baggenstoss, Paul M

In this paper, a Neural network is derived from first principles, assuming only that each layer begins with a linear dimension-reducing transformation. The approach appeals to the principle of Maximum Entropy (MaxEnt) to find the posterior distribution of the input data of each layer, conditioned on the layer output variables. This posterior has a well-defined mean, the conditional mean estimator, that is calculated using a type of neural network with theoretically-derived activation functions similar to sigmoid, softplus, and relu. This implicitly provides a theoretical justification for their use. A theorem that finds the conditional distribution and conditional mean estimator under the MaxEnt prior is proposed, unifying results for special cases. Combining layers results in an auto-encoder with conventional feed-forward analysis network and a type of linear Bayesian belief network in the reconstruction path.

activation function, manifold, surrogate density, (16 more...)

2002.07469

Country:

North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)

Taniguchi, Akira, Hagiwara, Yoshinobu, Taniguchi, Tadahiro, Inamura, Tetsunari

Spatial Concept-Based Navigation with Human Speech Instructions via Probabilistic Inference on Bayesian Generative Model

arXiv.org Artificial IntelligenceFeb-18-2020

Robots are required to not only learn spatial concepts autonomously but also utilize such knowledge for various tasks in a domestic environment. Spatial concept represents a multimodal place category acquired from the robot's spatial experience including vision, speech-language, and self-position. The aim of this study is to enable a mobile robot to perform navigational tasks with human speech instructions, such as `Go to the kitchen', via probabilistic inference on a Bayesian generative model using spatial concepts. Specifically, path planning was formalized as the maximization of probabilistic distribution on the path-trajectory under speech instruction, based on a control-as-inference framework. Furthermore, we described the relationship between probabilistic inference based on the Bayesian generative model and control problem including reinforcement learning. We demonstrated path planning based on human instruction using acquired spatial concepts to verify the usefulness of the proposed approach in the simulator and in real environments. Experimentally, places instructed by the user's speech commands showed high probability values, and the trajectory toward the target place was correctly estimated. Our approach, based on probabilistic inference concerning decision-making, can lead to further improvement in robot autonomy.

inference, robot, spatial concept, (17 more...)

arXiv.org Artificial Intelligence

2002.07381

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
(2 more...)

#artificialintelligenceFeb-17-2020, 23:47:44 GMT

Uber Goes to NeurIPS 2019

Sum-product networks (SPNs) are flexible density estimators and have received significant attention due to their attractive inference properties. While parameter learning in SPNs is well developed, structure learning leaves something to be desired: even though there is a plethora of SPN structure learners, most of them are somewhat ad-hoc, and based on intuition rather than a clear learning principle. In this paper, we introduce a well-principled Bayesian framework for SPN structure learning. The first is rather unproblematic and akin to neural network architecture validation. The second characterizes the effective structure of the SPN and needs to respect the usual structural constraints in SPN, i.e., completeness and decomposability.

artificial intelligence, machine learning, neurips 2019, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)