AITopics | Uncertainty

Collaborating Authors

Uncertainty

"AI systems–like people–must often act despite partial and uncertain information. First, the information received may be unreliable (e.g., a patient may mis-remember when a disease started, or may not have noticed a symptom that is important to a diagnosis). In addition, rules connecting real-world events can never include all the factors that might determine whether their conclusions really apply (e.g., the correctness of basing a diagnosis on a lab test depends whether there were conditions that might have caused a false positive, on the test being done correctly, on the results being associated with the right patient, etc.) Thus in order to draw useful conclusions, AI systems must be able to reason about the probability of events, given their current knowledge."
– from David Leake, Reasoning Under Uncertainty

News Overviews Instructional Materials AI-Alerts Classics

The Poisson transform for unnormalised statistical models

Barthelmé, Simon, Chopin, Nicolas

arXiv.org Machine LearningNov-27-2014

Contrary to standard statistical models, unnormalised statistical models only specify the likelihood function up to a constant. While such models are natural and popular, the lack of normalisation makes inference much more difficult. Here we show that inferring the parameters of a unnormalised model on a space $\Omega$ can be mapped onto an equivalent problem of estimating the intensity of a Poisson point process on $\Omega$. The unnormalised statistical model now specifies an intensity function that does not need to be normalised. Effectively, the normalisation constant may now be inferred as just another parameter, at no loss of information. The result can be extended to cover non-IID models, which includes for example unnormalised models for sequences of graphs (dynamical graphs), or for sequences of binary vectors. As a consequence, we prove that unnormalised parameteric inference in non-IID models can be turned into a semi-parametric estimation problem. Moreover, we show that the noise-contrastive divergence of Gutmann & Hyv\"arinen (2012) can be understood as an approximation of the Poisson transform, and extended to non-IID settings. We use our results to fit spatial Markov chain models of eye movements, where the Poisson transform allows us to turn a highly non-standard model into vanilla semi-parametric logistic regression.

artificial intelligence, exp, machine learning, (18 more...)

arXiv.org Machine Learning

1406.2839

Country: Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Add feedback

Noise Benefits in Expectation-Maximization Algorithms

Osoba, Osonde Adekorede

arXiv.org Machine LearningNov-24-2014

This dissertation shows that careful injection of noise into sample data can substantially speed up Expectation-Maximization algorithms. Expectation-Maximization algorithms are a class of iterative algorithms for extracting maximum likelihood estimates from corrupted or incomplete data. The convergence speed-up is an example of a noise benefit or "stochastic resonance" in statistical signal processing. The dissertation presents derivations of sufficient conditions for such noise-benefits and demonstrates the speed-up in some ubiquitous signal-processing algorithms. These algorithms include parameter estimation for mixture models, the $k$-means clustering algorithm, the Baum-Welch algorithm for training hidden Markov models, and backpropagation for training feedforward artificial neural networks. This dissertation also analyses the effects of data and model corruption on the more general Bayesian inference estimation framework. The main finding is a theorem guaranteeing that uniform approximators for Bayesian model functions produce uniform approximators for the posterior pdf via Bayes theorem. This result also applies to hierarchical and multidimensional Bayesian models.

neural network, noisy expectation maximization, upstream oil & gas, (26 more...)

arXiv.org Machine Learning

1411.6622

Country:

North America > United States > California (0.45)
Europe > United Kingdom > England (0.14)
Asia > Middle East > Israel > Mediterranean Sea (0.13)
North America > United States > Texas (0.13)

Genre:

Summary/Review (1.00)
Research Report (1.00)
Overview (0.92)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

bartMachine: Machine Learning with Bayesian Additive Regression Trees

Kapelner, Adam, Bleich, Justin

arXiv.org Machine LearningNov-24-2014

Ensemble-of-trees methods have become popular choices for forecasting in both regression and classification problems. Algorithms such as random forests (Breiman 2001) and stochastic gradient boosting (Friedman 2002) are two well-established and widely employed procedures. Recent advances in ensemble methods include dynamic trees (Taddy, Gramacy, and Polson 2011) and Bayesian additive regression trees (BART, Chipman, George, and McCulloch 2010), which depart from predecessors in that they rely on an underlying Bayesian probability model rather than a pure algorithm. BART has demonstrated substantial promise in a wide variety of simulations and real world applications such as predicting avalanches on mountain roads (Blattenberger and Fowles 2014), predicting how transcription factors interact with DNA (Zhou and Liu 2008) and predicting movie box office revenues (Eliashberg 2010). This paper introduces bartMachine, a new R (R Core Team 2014) package available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package

artificial intelligence, bartmachine, machine learning, (17 more...)

arXiv.org Machine Learning

1312.2171

Country:

North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Automobiles & Trucks (0.69)
Health & Medicine (0.67)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

A Greedy, Flexible Algorithm to Learn an Optimal Bayesian Network Structure

Soltani, Amir Arsalan

arXiv.org Machine LearningNov-24-2014

In this report paper we first present a report of the Advanced Machine Learning Course Project on the provided data set and then present a novel heuristic algorithm for exact Bayesian network (BN) structure discovery that uses decomposable scoring functions. Our algorithm follows a different approach to solve the problem of BN structure discovery than the previously used methods such as Dynamic Programming (DP) and Branch and Bound to reduce the search space and find the global optima space for the problem. The algorithm we propose has some degree of flexibility that can make it more or less greedy. The more the algorithm is set to be greedy, the more the speed of the algorithm will be, and the less optimal the final structure. Our algorithm runs in a much less time than the previously known methods and guarantees to have an optimality of close to 99%.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

1411.6651

Country: North America > United States > New York (0.28)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Approximate evaluation of marginal association probabilities with belief propagation

Williams, Jason L., Lau, Roslyn A.

arXiv.org Artificial IntelligenceNov-19-2014

Data association, the problem of reasoning over correspondence between targets and measurements, is a fundamental problem in tracking. This paper presents a graphical model formulation of data association and applies an approximate inference method, belief propagation (BP), to obtain estimates of marginal association probabilities. We prove that BP is guaranteed to converge, and bound the number of iterations necessary. Experiments reveal a favourable comparison to prior methods in terms of accuracy and computational complexity.

algorithm, artificial intelligence, belief revision, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TAES.2014.120568

1209.6299

Country:

North America > United States > Massachusetts (0.28)
Oceania > Australia > Queensland (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

SIMD Parallel MCMC Sampling with Applications for Big-Data Bayesian Analytics

Mahani, Alireza S., Sharabiani, Mansour T. A.

arXiv.org Artificial IntelligenceNov-19-2014

Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of potential opportunities to parallelize techniques such as MCMC sampling, and the development of general strategies for mapping such parallel algorithms to modern CPUs in order to elicit the performance up the compute-based and/or memory-based hardware limits. Two opportunities for Single-Instruction Multiple-Data (SIMD) parallelization of MCMC sampling for probabilistic graphical models are presented. In exchangeable models with many observations such as Bayesian Generalized Linear Models, child-node contributions to the conditional posterior of each node can be calculated concurrently. In undirected graphs with discrete nodes, concurrent sampling of conditionally-independent nodes can be transformed into a SIMD form. High-performance libraries with multi-threading and vectorization capabilities can be readily applied to such SIMD opportunities to gain decent speedup, while a series of high-level source-code and runtime modifications provide further performance boost by reducing parallelization overhead and increasing data locality for NUMA architectures. For big-data Bayesian GLM graphs, the end-result is a routine for evaluating the conditional posterior and its gradient vector that is 5 times faster than a naive implementation using (built-in) multi-threaded Intel MKL BLAS, and reaches within the striking distance of the memory-bandwidth-induced hardware limit. The proposed optimization strategies improve the scaling of performance with number of cores and width of vector units (applicable to many-core SIMD processors such as Intel Xeon Phi and GPUs), resulting in cost-effectiveness, energy efficiency, and higher speed on multi-core x86 processors.

data mining, machine learning, node, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.csda.2015.02.010

1310.1537

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

The NLMS algorithm with time-variant optimum stepsize derived from a Bayesian network perspective

Huemmer, Christian, Maas, Roland, Kellermann, Walter

arXiv.org Machine LearningNov-18-2014

In this article, we derive a new stepsize adaptation for the normalized least mean square algorithm (NLMS) by describing the task of linear acoustic echo cancellation from a Bayesian network perspective. Similar to the well-known Kalman filter equations, we model the acoustic wave propagation from the loudspeaker to the microphone by a latent state vector and define a linear observation equation (to model the relation between the state vector and the observation) as well as a linear process equation (to model the temporal progress of the state vector). Based on additional assumptions on the statistics of the random variables in observation and process equation, we apply the expectation-maximization (EM) algorithm to derive an NLMS-like filter adaptation. By exploiting the conditional independence rules for Bayesian networks, we reveal that the resulting EM-NLMS algorithm has a stepsize update equivalent to the optimal-stepsize calculation proposed by Yamamoto and Kitayama in 1982, which has been adopted in many textbooks. As main difference, the instantaneous stepsize value is estimated in the M step of the EM algorithm (instead of being approximated by artificially extending the acoustic echo path). The EM-NLMS algorithm is experimentally verified for synthesized scenarios with both, white noise and male speech as input signal.

artificial intelligence, bayesian inference, machine learning, (12 more...)

arXiv.org Machine Learning

1411.4834

Country: Asia > China (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Parallel Gaussian Process Regression for Big Data: Low-Rank Representation Meets Markov Approximation

Low, Kian Hsiang, Yu, Jiangbo, Chen, Jie, Jaillet, Patrick

arXiv.org Machine LearningNov-17-2014

The expressive power of a Gaussian process (GP) model comes at a cost of poor scalability in the data size. To improve its scalability, this paper presents a low-rank-cum-Markov approximation (LMA) of the GP model that is novel in leveraging the dual computational advantages stemming from complementing a low-rank approximate representation of the full-rank GP based on a support set of inputs with a Markov approximation of the resulting residual process; the latter approximation is guaranteed to be closest in the Kullback-Leibler distance criterion subject to some constraint and is considerably more refined than that of existing sparse GP models utilizing low-rank representations due to its more relaxed conditional independence assumption (especially with larger data). As a result, our LMA method can trade off between the size of the support set and the order of the Markov property to (a) incur lower computational cost than such sparse GP models while achieving predictive performance comparable to them and (b) accurately represent features/patterns of any scale. Interestingly, varying the Markov order produces a spectrum of LMAs with PIC approximation and full-rank GP at the two extremes. An advantage of our LMA method is that it is amenable to parallelization on multiple machines/cores, thereby gaining greater scalability. Empirical evaluation on three real-world datasets in clusters of up to 32 computing nodes shows that our centralized and parallel LMA methods are significantly more time-efficient and scalable than state-of-the-art sparse and full-rank GP regression methods while achieving comparable predictive performances.

approximation, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1411.451

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback

Robust Kernel Density Estimation by Scaling and Projection in Hilbert Space

Vandermeulen, Robert A., Scott, Clayton D.

arXiv.org Machine LearningNov-17-2014

While robust parameter estimation has been well studied in parametric density estimation, there has been little investigation into robust density estimation in the nonparametric setting. We present a robust version of the popular kernel density estimator (KDE). As with other estimators, a robust version of the KDE is useful since sample contamination is a common issue with datasets. What "robustness" means for a nonparametric density estimate is not straightforward and is a topic we explore in this paper. To construct a robust KDE we scale the traditional KDE and project it to its nearest weighted KDE in the $L^2$ norm. This yields a scaled and projected KDE (SPKDE). Because the squared $L^2$ norm penalizes point-wise errors superlinearly this causes the weighted KDE to allocate more weight to high density regions. We demonstrate the robustness of the SPKDE with numerical experiments and a consistency result which shows that asymptotically the SPKDE recovers the uncontaminated density under sufficient conditions on the contamination.

artificial intelligence, machine learning, tar, (13 more...)

arXiv.org Machine Learning

1411.4378

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.81)

Add feedback

A unifying framework for relaxations of the causal assumptions in Bell's theorem

Chaves, Rafael, Kueng, Richard, Brask, Jonatan Bohr, Gross, David

arXiv.org Machine LearningNov-17-2014

Bell's Theorem shows that quantum mechanical correlations can violate the constraints that the causal structure of certain experiments impose on any classical explanation. It is thus natural to ask to which degree the causal assumptions -- e.g. locality or measurement independence -- have to be relaxed in order to allow for a classical description of such experiments. Here, we develop a conceptual and computational framework for treating this problem. We employ the language of Bayesian networks to systematically construct alternative causal structures and bound the degree of relaxation using quantitative measures that originate from the mathematical theory of causality. The main technical insight is that the resulting problems can often be expressed as computationally tractable linear programs. We demonstrate the versatility of the framework by applying it to a variety of scenarios, ranging from relaxations of the measurement independence, locality and bilocality assumptions, to a novel causal interpretation of CHSH inequality violations.

artificial intelligence, correlation, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1103/PhysRevLett.114.140403

1411.4648

Country:

Europe > Germany (0.28)
Europe > Switzerland (0.28)
North America > United States (0.28)

Genre: Research Report (0.63)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback