AITopics | Uncertainty

Collaborating Authors

Uncertainty

"AI systems–like people–must often act despite partial and uncertain information. First, the information received may be unreliable (e.g., a patient may mis-remember when a disease started, or may not have noticed a symptom that is important to a diagnosis). In addition, rules connecting real-world events can never include all the factors that might determine whether their conclusions really apply (e.g., the correctness of basing a diagnosis on a lab test depends whether there were conditions that might have caused a false positive, on the test being done correctly, on the results being associated with the right patient, etc.) Thus in order to draw useful conclusions, AI systems must be able to reason about the probability of events, given their current knowledge."
– from David Leake, Reasoning Under Uncertainty

News Overviews Instructional Materials AI-Alerts Classics

Everything that Works Works Because it's Bayesian: Why Deep Nets Generalize?

@machinelearnbotJun-9-2017, 18:35:09 GMT

We could not so far claim that deep networks trained with stochastic gradient descent are Bayesian. And it may be because SGD biases learning towards flat minima, rather than sharp minima. It turns out, (Hochreiter and Schmidhuber, 1997) motivated their work on seeking flat minima from a Bayesian, minimum description length perspective. Seeking flat minima makes sense from a minimum description length perspective.

Bayesian, Bayesian inference, machine learning, (13 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.30)

Add feedback

Symmetry Learning for Function Approximation in Reinforcement Learning

Mahajan, Anuj, Tulabandhula, Theja

arXiv.org Machine LearningJun-9-2017

Reinforcement Learning (RL) is the task of training an agent to perform optimally in an environment using the reward and observation signals perceived upon taking actions which change the environment dynamics. Learning optimal behavior is inherently difficult because of challenges like credit assignment and exploration-exploitation trade offs that need to be made while converging to a solution. In many scenarios, like training a rover to move on a Martian surface, the cost of obtaining samples for learning can be high (in terms of robot's energy expenditure etc.), and so sample efficiency is an important subproblem which deserves special attention. Very often it is the case that the environment has intrinsic symmetries which can be leveraged by the agent to improve performance and learn more efficiently. For example, in the Cart-Pole domain [1, 2] the state action space is symmetric with respect to reflection about the plane perpendicular to the direction of motion of the cart (Figure 1). In fact, in many environments, the number of symmetry relations tend to increase with the dimensionality of the state space. For instance, for the simple case of grid world of dimension d (Figure 1) there exist O(d!2

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1706.02999

Country: North America > United States (1.00)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Add feedback

[P] Conditional density estimation using Kernel Mixture Networks, theory implementation in TF • r/MachineLearning

@machinelearnbotJun-8-2017, 09:55:18 GMT

Me and a colleague were intrigued by a paper that was released 2-3 weeks ago by LucaAmbrogioni. Density estimation is useful in a lot of problems, so we decided to implement this paper, explain the whole concept in this blog post and looked for some potential extensions.

artificial intelligence, conditional density estimation, social media, (3 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.80)

Add feedback

Collaborative Filtering with Side Information: a Gaussian Process Perspective

Kim, Hyunjik, Lu, Xiaoyu, Flaxman, Seth, Teh, Yee Whye

arXiv.org Machine LearningJun-8-2017

We tackle the problem of collaborative filtering (CF) with side information, through the lens of Gaussian Process (GP) regression. Driven by the idea of using the kernel to explicitly model user-item similarities, we formulate the GP in a way that allows the incorporation of low-rank matrix factorisation, arriving at our model, the Tucker Gaussian Process (TGP). Consequently, TGP generalises classical Bayesian matrix factorisation models, and goes beyond them to give a natural and elegant method for incorporating side information, giving enhanced predictive performance for CF problems. Moreover we show that it is a novel model for regression, especially well-suited to grid-structured data and problems where the dependence on covariates is close to being separable.

kernel, side information, tgp, (14 more...)

arXiv.org Machine Learning

1605.07025

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

On learning the structure of Bayesian Networks and submodular function maximization

Caravagna, Giulio, Ramazzotti, Daniele, Sanguinetti, Guido

arXiv.org Machine LearningJun-7-2017

Learning the structure of dependencies among multiple random variables is a problem of considerable theoretical and practical interest. In practice, score optimisation with multiple restarts provides a practical and surprisingly successful solution, yet the conditions under which this may be a well founded strategy are poorly understood. In this paper, we prove that the problem of identifying the structure of a Bayesian Network via regularised score optimisation can be recast, in expectation, as a submodular optimisation problem, thus guaranteeing optimality with high probability. This result both explains the practical success of optimisation heuristics, and suggests a way to improve on such algorithms by artificially simulating multiple data sets via a bootstrap procedure. We show on several synthetic data sets that the resulting algorithm yields better recovery performance than the state of the art, and illustrate in a real cancer genomic study how such an approach can lead to valuable practical insights.

artificial intelligence, hill climbing, machine learning, (16 more...)

arXiv.org Machine Learning

1706.02386

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(2 more...)

Add feedback

Anytime Monte Carlo

Murray, Lawrence M., Singh, Sumeetpal, Jacob, Pierre E., Lee, Anthony

arXiv.org Machine LearningJun-7-2017

A Monte Carlo algorithm typically simulates some prescribed number of samples, taking some random real time to complete the computations necessary. This work considers the converse: to impose a real-time budget on the computation, so that the number of samples simulated is random. To complicate matters, the real time taken for each simulation may depend on the sample produced, so that the samples themselves are not independent of their number, and a length bias with respect to compute time is apparent. This is especially problematic when a Markov chain Monte Carlo (MCMC) algorithm is used and the final state of the Markov chain---rather than an average over all states---is required. The length bias does not diminish with the compute budget in this case. It occurs, for example, in sequential Monte Carlo (SMC) algorithms. We propose an anytime framework to address the concern, using a continuous-time Markov jump process to study the progress of the computation in real time. We show that the length bias can be eliminated for any MCMC algorithm by using a multiple chain construction. The utility of this construction is demonstrated on a large-scale SMC-squared implementation, using four billion particles distributed across a cluster of 128 graphics processing units on the Amazon EC2 service. The anytime framework imposes a real-time budget on the MCMC move steps within SMC-squared, ensuring that all processors are simultaneously ready for the resampling step, demonstrably reducing wait times and providing substantial control over the total compute budget.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1612.03319

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Fitting Gaussian Process Models in Python

#artificialintelligenceJun-6-2017, 06:50:15 GMT

Written by Chris Fonnesbeck, Assistant Professor of Biostatistics, Vanderbilt University Medical Center. You can view, fork, and play with this project on the Domino data science platform. A common applied statistics task involves building regression models to characterize non-linear relationships between variables. It is possible to fit such models by assuming a particular non-linear functional form, such as a sinusoidal, exponential, or polynomial function, to describe one variable's response to the variation in another. Unless this relationship is obvious from the outset, however, it involves possibly extensive model selection procedures to ensure the most appropriate model is retained. Alternatively, a non-parametric approach can be adopted by defining a set of knots across the variable space and use a spline or kernel regression to describe arbitrary non-linear relationships.

artificial intelligence, covariance function, machine learning, (18 more...)

#artificialintelligence

Industry: Health & Medicine > Health Care Providers & Services (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Implementing a Bayes Filter in a Neural Circuit: The Case of Unknown Stimulus Dynamics

Sokoloski, Sacha

arXiv.org Machine LearningJun-6-2017

In order to interact intelligently with objects in the world, animals must first transform neural population responses into estimates of the dynamic, unknown stimuli which caused them. The Bayesian solution to this problem is known as a Bayes filter, which applies Bayes' rule to combine population responses with the predictions of an internal model. In this paper we present a method for learning to approximate a Bayes filter when the stimulus dynamics are unknown. To do this we use the inferential properties of probabilistic population codes to compute Bayes' rule, and train a neural network to compute approximate predictions by the method of maximum likelihood. In particular, we perform stochastic gradient descent on the negative log-likelihood with a novel approximation of the gradient. We demonstrate our methods on a finite-state, a linear, and a nonlinear filtering problem, and show how the hidden layer of the neural network develops tuning curves which are consistent with findings in experimental neuroscience.

artificial intelligence, machine learning, neural circuit, (17 more...)

arXiv.org Machine Learning

1512.07839

Country: North America (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Ten Steps of EM Suffice for Mixtures of Two Gaussians

Daskalakis, Constantinos, Tzamos, Christos, Zampetakis, Manolis

arXiv.org Machine LearningJun-5-2017

The Expectation-Maximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the k-means clustering algorithm. Despite its wide use and applications, there are essentially no known convergence guarantees for this method. We provide global convergence guarantees for mixtures of two Gaussians with known covariance matrices. We show that the population version of EM, where the algorithm is given access to infinitely many samples from the mixture, converges geometrically to the correct mean vectors, and provide simple, closed-form expressions for the convergence rate. As a simple illustration, we show that, in one dimension, ten steps of the EM algorithm initialized at infinity result in less than 1\% error estimation of the means. In the finite sample regime, we show that, under a random initialization, $\tilde{O}(d/\epsilon^2)$ samples suffice to compute the unknown vectors to within $\epsilon$ in Mahalanobis distance, where $d$ is the dimension. In particular, the error rate of the EM based estimator is $\tilde{O}\left(\sqrt{d \over n}\right)$ where $n$ is the number of samples, which is optimal up to logarithmic factors.

artificial intelligence, machine learning, tanh, (18 more...)

arXiv.org Machine Learning

1609.00368

Country: North America > United States (0.28)

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Experimental Design for Non-Parametric Correction of Misspecified Dynamical Models

Shulkind, Gal, Horesh, Lior, Avron, Haim

arXiv.org Machine LearningJun-4-2017

We consider a class of misspecified dynamical models where the governing term is only approximately known. Under the assumption that observations of the system's evolution are accessible for various initial conditions, our goal is to infer a non-parametric correction to the misspecified driving term such as to faithfully represent the system dynamics and devise system evolution predictions for unobserved initial conditions. We model the unknown correction term as a Gaussian Process and analyze the problem of efficient experimental design to find an optimal correction term under constraints such as a limited experimental budget. We suggest a novel formulation for experimental design for this Gaussian Process and show that approximately optimal (up to a constant factor) designs may be efficiently derived by utilizing results from the literature on submodular optimization. Our numerical experiments exemplify the effectiveness of these techniques.

artificial intelligence, experiment, machine learning, (18 more...)

arXiv.org Machine Learning

1705.00956

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback