AITopics | Uncertainty

Collaborating Authors

Uncertainty

"AI systems–like people–must often act despite partial and uncertain information. First, the information received may be unreliable (e.g., a patient may mis-remember when a disease started, or may not have noticed a symptom that is important to a diagnosis). In addition, rules connecting real-world events can never include all the factors that might determine whether their conclusions really apply (e.g., the correctness of basing a diagnosis on a lab test depends whether there were conditions that might have caused a false positive, on the test being done correctly, on the results being associated with the right patient, etc.) Thus in order to draw useful conclusions, AI systems must be able to reason about the probability of events, given their current knowledge."
– from David Leake, Reasoning Under Uncertainty

News Overviews Instructional Materials AI-Alerts Classics

Frank-Wolfe Stein Sampling

Futami, Futoshi, Cui, Zhenghang, Sato, Issei, Sugiyama, Masashi

arXiv.org Machine LearningMay-21-2018

In Bayesian inference, the posterior distributions are difficult to obtain analytically for complex models such as neural networks. Variational inference usually uses a parametric distribution to approximate, from which we can easily draw samples. Recently discrete approximation by particles has attracted attention because of its expressive ability. An example is Stein variational gradient descent (SVGD), which iteratively optimizes particles. Although SVGD has been shown to be computationally efficient empirically, its theoretical properties have not been clarified yet and no finite sample bound of a convergence rate is known. Another example is Stein points (SP), which minimizes kernelized Stein discrepancy directly. The finite sample bound of SP is $\mathcal{O}(\sqrt{\log{N}/N})$ for $N$ particles, which is computationally inefficient empirically, especially in high-dimensional problems. In this paper, we propose a novel method named \emph{Frank-Wolfe Stein sampling}, which minimizes the maximum mean discrepancy in a greedy way. Our method is computationally efficient empirically and theoretically achieves a faster convergence rate, $\mathcal{O}(e^{-N})$. Numerical experiments show the superiority of our method.

artificial intelligence, machine learning, particle, (17 more...)

arXiv.org Machine Learning

1805.07912

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)

Add feedback

Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting

Ritter, Hippolyt, Botev, Aleksandar, Barber, David

arXiv.org Machine LearningMay-20-2018

We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode, which is typically intractable for modern architectures. In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature. Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1805.0781

Genre: Research Report (1.00)

Industry: Education (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Minimax Lower Bounds for Cost Sensitive Classification

Kamalaruban, Parameswaran, Williamson, Robert C.

arXiv.org Machine LearningMay-20-2018

The central problem of this paper is the cost-sensitive binary classification problem, where different costs are associated with different types of mistakes. Several important machine learning applications such as medical decision making, targeted marketing, and intrusion detection can be naturally formalized as costsensitive classification setup ([1]). In these domains, the cost of missing a target is much higher than that of a false-positive, and classifiers that do not take misclassification costs into account do not perform well. The cost-sensitive classification problem has been extensively studied, and people have developed efficient algorithms with provable guarantees on the (generalization) error [6, 9, 26, 27, 11, 4]. These methods primarily take existing classification methods based on empirical risk minimization and try to adapt them in various ways to be sensitive to these misclassification costs. Despite all these efforts, the understanding of the fundamental limits of this problem is still missing. In this paper, we study the hardness of this problem by obtaining minimax lower bounds. In particular, we are interested in understanding how the cost parameter influences the hardness or complexity of the cost-sensitive classification. Minimax Lower Bounds Understanding the hardness or fundamental limits of a learning problem is important for practice for the following reasons: - They give an estimate on the number of samples required for a good performance of a learning algorithm.

artificial intelligence, classification problem, machine learning, (14 more...)

arXiv.org Machine Learning

1805.07723

Genre: Research Report (0.50)

Industry:

Education (0.49)
Health & Medicine (0.48)
Law Enforcement & Public Safety (0.48)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
(2 more...)

Add feedback

An Online RFID Localization in the Manufacturing Shopfloor

Ashfahani, Andri, Pratama, Mahardhika, Lughofer, Edwin, Cai, Qing, Sheng, Huang

arXiv.org Artificial IntelligenceMay-20-2018

Radio Frequency Identification technology has gained popularity for cheap and easy deployment. In the realm of manufacturing shopfloor, it can be used to track the location of manufacturing objects to achieve better efficiency. The underlying challenge of localization lies in the non-stationary characteristics of manufacturing shopfloor which calls for an adaptive life-long learning strategy in order to arrive at accurate localization results. This paper presents an evolving model based on a novel evolving intelligent system, namely evolving Type-2 Quantum Fuzzy Neural Network (eT2QFNN), which features an interval type-2 quantum fuzzy set with uncertain jump positions. The quantum fuzzy set possesses a graded membership degree which enables better identification of overlaps between classes. The eT2QFNN works fully in the evolving mode where all parameters including the number of rules are automatically adjusted and generated on the fly. The parameter adjustment scenario relies on decoupled extended Kalman filter method. Our numerical study shows that eT2QFNN is able to deliver comparable accuracy compared to state-of-the-art algorithms.

artificial intelligence, fuzzy logic, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1805.07715

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Continuing Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications > Networks > Sensor Networks (0.88)

Add feedback

r/MachineLearning - [R] To Build Truly Intelligent Machines, Teach Them Cause and Effect

@machinelearnbotMay-19-2018, 15:40:37 GMT

Just FYI: "Judea Pearl created the representational and computational foundation for the processing of information under uncertainty. He is credited with the invention of Bayesian networks,..." His wiki page doesn't list his contributions, it links to other people's summaries of his contributions. The guy you say is stuck in an old paradigm is one of the main inventors of the new paradigm.

artificial intelligence, intelligent machine, machinelearning, (2 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.84)

Add feedback

Heterogeneous Multi-output Gaussian Process Prediction

Moreno-Muñoz, Pablo, Artés-Rodríguez, Antonio, Álvarez, Mauricio A.

arXiv.org Machine LearningMay-19-2018

Multi-output Gaussian processes (MOGP) generalise the powerful Gaussian process (GP) predictive model to the vector-valued random field setup (Alvarez et al., 2012). It has been experimentally shown that by simultaneously exploiting correlations between multiple outputs and across the input space, it is possible to provide better predictions, particularly in scenarios with missing or noisy data (Bonilla et al., 2008; Dai et al., 2017). The main focus in the literature for MOGP has been on the definition of a suitable cross-covariance function between the multiple outputs that allows for the treatment of outputs as a single GP with a properly defined covariance function (Alvarez et al., 2012). The two classical alternatives to define such cross-covariance functions are the linear model of coregionalisation (LMC) (Journel and Huijbregts, 1978) and process convolutions (Higdon, 2002). In the former case, each output corresponds to a weighted sum of shared latent random functions.

artificial intelligence, machine learning, modeling & simulation, (15 more...)

arXiv.org Machine Learning

1805.07633

Country: Europe (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Distributionally Robust Inverse Covariance Estimation: The Wasserstein Shrinkage Estimator

Nguyen, Viet Anh, Kuhn, Daniel, Esfahani, Peyman Mohajerin

arXiv.org Machine LearningMay-18-2018

We introduce a distributionally robust maximum likelihood estimation model with a Wasserstein ambiguity set to infer the inverse covariance matrix of a $p$-dimensional Gaussian random vector from $n$ independent samples. The proposed model minimizes the worst case (maximum) of Stein's loss across all normal reference distributions within a prescribed Wasserstein distance from the normal distribution characterized by the sample mean and the sample covariance matrix. We prove that this estimation problem is equivalent to a semidefinite program that is tractable in theory but beyond the reach of general purpose solvers for practically relevant problem dimensions $p$. In the absence of any prior structural information, the estimation problem has an analytical solution that is naturally interpreted as a nonlinear shrinkage estimator. Besides being invertible and well-conditioned even for $p>n$, the new shrinkage estimator is rotation-equivariant and preserves the order of the eigenvalues of the sample covariance matrix. These desirable properties are not imposed ad hoc but emerge naturally from the underlying distributionally robust optimization model. Finally, we develop a sequential quadratic approximation algorithm for efficiently solving the general estimation problem subject to conditional independence constraints typically encountered in Gaussian graphical models.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

1805.07194

Country:

Europe (0.67)
North America > United States (0.67)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.67)
Health & Medicine > Therapeutic Area > Hematology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.56)

Add feedback

Approximate Model Counting by Partial Knowledge Compilation

Lai, Yong

arXiv.org Artificial IntelligenceMay-18-2018

Model counting is the problem of computing the number of satisfying assignments of a given propositional formula. Although exact model counters can be naturally furnished by most of the knowledge compilation (KC) methods, in practice, they fail to generate the compiled results for the exact counting of models for certain formulas due to the explosion in sizes. Decision-DNNF is an important KC language that captures most of the practical compilers. We propose a generalized Decision-DNNF (referred to as partial Decision-DNNF) via introducing a class of new leaf vertices (called unknown vertices), and then propose an algorithm called PartialKC to generate randomly partial Decision-DNNF formulas from the given formulas. An unbiased estimate of the model number can be computed via a randomly partial Decision-DNNF formula. Each calling of PartialKC consists of multiple callings of MicroKC, while each of the latter callings is a process of importance sampling equipped with KC technologies. The experimental results show that PartialKC is more accurate than both SampleSearch and SearchTreeSampler, PartialKC scales better than SearchTreeSampler, and the KC technologies can obviously accelerate sampling.

artificial intelligence, decision-dnnf formula, formula, (16 more...)

arXiv.org Artificial Intelligence

1805.0718

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Dumitrascu, Bianca, Feng, Karen, Engelhardt, Barbara E

arXiv.org Machine LearningMay-18-2018

We address the problem of regret minimization in logistic contextual bandits, where a learner decides among sequential actions or arms given their respective contexts to maximize binary rewards. Using a fast inference procedure with Polya-Gamma distributed augmentation variables, we propose an improved version of Thompson Sampling, a Bayesian formulation of contextual bandits with near-optimal performance. Our approach, Polya-Gamma augmented Thompson Sampling (PG-TS), achieves state-of-the-art performance on simulated and real data. PG-TS explores the action space efficiently and exploits high-reward arms, quickly converging to solutions of low regret. Its explicit estimation of the posterior distribution of the context feature covariance leads to substantial empirical gains over approximate approaches. PG-TS is the first approach to demonstrate the benefits of Polya-Gamma augmentation in bandits and to propose an efficient Gibbs sampler for approximating the analytically unsolvable integral of logistic contextual bandits.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1805.07458

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Accurate Kernel Learning for Linear Gaussian Markov Processes using a Scalable Likelihood Computation

de Waele, Stijn

arXiv.org Machine LearningMay-18-2018

We report an exact likelihood computation for Linear Gaussian Markov processes that is more scalable than existing algorithms for complex models and sparsely sampled signals. Better scaling is achieved through elimination of repeated computations in the Kalman likelihood, and by using the diagonalized form of the state transition equation. Using this efficient computation, we study the accuracy of kernel learning using maximum likelihood and the posterior mean in a simulation experiment. The posterior mean with a reference prior is more accurate for complex models and sparse sampling. Because of its lower computation load, the maximum likelihood estimator is an attractive option for more densely sampled signals and lower order models. We confirm estimator behavior in experimental data through their application to speleothem data.

artificial intelligence, computation, machine learning, (15 more...)

arXiv.org Machine Learning

1805.07346

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)

Add feedback