AITopics | Uncertainty

Collaborating Authors

Uncertainty

"AI systems–like people–must often act despite partial and uncertain information. First, the information received may be unreliable (e.g., a patient may mis-remember when a disease started, or may not have noticed a symptom that is important to a diagnosis). In addition, rules connecting real-world events can never include all the factors that might determine whether their conclusions really apply (e.g., the correctness of basing a diagnosis on a lab test depends whether there were conditions that might have caused a false positive, on the test being done correctly, on the results being associated with the right patient, etc.) Thus in order to draw useful conclusions, AI systems must be able to reason about the probability of events, given their current knowledge."
– from David Leake, Reasoning Under Uncertainty

News Overviews Instructional Materials AI-Alerts Classics

Multivariate response and parsimony for Gaussian cluster-weighted models

Dang, Utkarsh J., Punzo, Antonio, McNicholas, Paul D., Ingrassia, Salvatore, Browne, Ryan P.

arXiv.org Machine LearningFeb-26-2016

A family of parsimonious Gaussian cluster-weighted models is presented. This family concerns a multivariate extension to cluster-weighted modelling that can account for correlations between multivariate responses. Parsimony is attained by constraining parts of an eigen-decomposition imposed on the component covariance matrices. A sufficient condition for identifiability is provided and an expectation-maximization algorithm is presented for parameter estimation. Model performance is investigated on both synthetic and classical real data sets and compared with some popular approaches. Finally, accounting for linear dependencies in the presence of a linear regression structure is shown to offer better performance, vis-\`{a}-vis clustering, over existing methodologies.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1411.056

Country:

North America > United States (1.00)
Europe (0.67)
North America > Canada > Ontario (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Learning Gaussian Graphical Models With Fractional Marginal Pseudo-likelihood

Leppä-aho, Janne, Pensar, Johan, Roos, Teemu, Corander, Jukka

arXiv.org Machine LearningFeb-25-2016

We propose a Bayesian approximate inference method for learning the dependence structure of a Gaussian graphical model. Using pseudo-likelihood, we derive an analytical expression to approximate the marginal likelihood for an arbitrary graph structure without invoking any assumptions about decomposability. The majority of the existing methods for learning Gaussian graphical models are either restricted to decomposable graphs or require specification of a tuning parameter that may have a substantial impact on learned structures. By combining a simple sparsity inducing prior for the graph structures with a default reference prior for the model parameters, we obtain a fast and easily applicable scoring function that works well for even high-dimensional data. We demonstrate the favourable performance of our approach by large-scale comparisons against the leading methods for learning non-decomposable Gaussian graphical models. A theoretical justification for our method is provided by showing that it yields a consistent estimator of the graph structure.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1016/j.ijar.2017.01.001

1602.07863

Country: Europe > Finland (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Asymptotic consistency and order specification for logistic classifier chains in multi-label learning

Teisseyre, Paweł

arXiv.org Machine LearningFeb-24-2016

Machine Learning manuscript No. (will be inserted by the editor)Asymptotic consistency and order specification for logistic classifier chains in multi-label learning Paweł T eisseyre Received: date / Accepted: date Abstract Classifier chains are popular and effective method to tackle a multi-label classification problem. The aim of this paper is to study the asymptotic properties of the chain model in which the conditional probabilities are of the logistic form. In particular we find conditions on the number of labels and the distribution of feature vector under which the estimated mode of the joint distribution of labels converges to the true mode. Best of our knowledge, this important issue has not yet been studied in the context of multi-label learning. We also investigate how the order of model building in a chain influences the estimation of the joint distribution of labels. We establish the link between the problem of incorrect ordering in the chain and incorrect model specification. We propose a procedure of determining the optimal ordering of labels in the chain, which is based on using measures of correct specification and allows to find the ordering such that the consecutive logistic models are best possibly specified. The other important question raised in this paper is how accurately can we estimate the joint posterior probability when the ordering of labels is wrong or the logistic models in the chain are incorrectly specified. The numerical experiments illustrate the theoretical results. Keywords classifier chains· logistic regression· joint mode estimation· label ordering· asymptotic consistency 1 Introduction In multi-label classification the task is to automatically assign an object to multiple categories based on its characteristics. Each object of our interest is described by a feature vector x belonging to p-dimensional space and vector of K labels y ( y 1,..., y K)′ . In this paper we consider binary labels such thaty k 1 indicates that the considered object belongs to k-th category or has the k-th property. The issue has recently attracted significant attention, motivated by an increasing number of applications such as image and video annotationPaweł Teisseyre Institute of Computer Science, Polish Academy of Sciences Jana Kazimierza 5 01-248 Warsaw, Poland Tel.: 48-22-380-05-55 Email: teisseyrep@ipipan.waw.pl

artificial intelligence, logistic model, machine learning, (15 more...)

arXiv.org Machine Learning

1602.07466

Country: Europe > Poland > Masovia Province > Warsaw (0.24)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Max-Margin Nonparametric Latent Feature Models for Link Prediction

Zhu, Jun, Song, Jiaming, Chen, Bei

arXiv.org Machine LearningFeb-24-2016

Link prediction is a fundamental task in statistical network analysis. Recent advances have been made on learning flexible nonparametric Bayesian latent feature models for link prediction. In this paper, we present a max-margin learning method for such nonparametric latent feature relational models. Our approach attempts to unite the ideas of max-margin learning and Bayesian nonparametrics to discover discriminative latent features for link prediction. It inherits the advances of nonparametric Bayesian methods to infer the unknown latent social dimension, while for discriminative link prediction, it adopts the max-margin learning principle by minimizing a hinge-loss using the linear expectation operator, without dealing with a highly nonlinear link likelihood function. For posterior inference, we develop an efficient stochastic variational inference algorithm under a truncated mean-field assumption. Our methods can scale up to large-scale real networks with millions of entities and tens of millions of positive links. We also provide a full Bayesian formulation, which can avoid tuning regularization hyper-parameters. Experimental results on a diverse range of real datasets demonstrate the benefits inherited from max-margin learning and Bayesian nonparametric inference.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1602.07428

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(2 more...)

Add feedback

Recurrent Gaussian Processes

Mattos, César Lincoln C., Dai, Zhenwen, Damianou, Andreas, Forth, Jeremy, Barreto, Guilherme A., Lawrence, Neil D.

arXiv.org Machine LearningFeb-24-2016

We define Recurrent Gaussian Processes (RGP) models, a general family of Bayesian nonparametric models with recurrent GP priors which are able to learn dynamical patterns from sequential data. Similar to Recurrent Neural Networks (RNNs), RGPs can have different formulations for their internal states, distinct inference methods and be extended with deep structures. In such context, we propose a novel deep RGP model whose autoregressive states are latent, thereby performing representation and dynamical learning simultaneously. To fully exploit the Bayesian nature of the RGP model we develop the Recurrent Variational Bayes (REVARB) framework, which enables efficient inference and strong regularization through coherent propagation of uncertainty across the RGP layers and states. We also introduce a RGP extension where variational parameters are greatly reduced by being reparametrized through RNN-based sequential recognition models. We apply our model to the tasks of nonlinear system identification and human motion modeling. The promising obtained results indicate that our RGP model maintains its highly flexibility while being able to avoid overfitting and being applicable even when larger datasets are not available.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1511.06644

Country:

South America > Brazil (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Unsupervised Ensemble Learning with Dependent Classifiers

Jaffe, Ariel, Fetaya, Ethan, Nadler, Boaz, Jiang, Tingting, Kluger, Yuval

arXiv.org Machine LearningFeb-23-2016

In unsupervised ensemble learning, one obtains predictions from multiple sources or classifiers, yet without knowing the reliability and expertise of each source, and with no labeled data to assess it. The task is to combine these possibly conflicting predictions into an accurate meta-learner. Most works to date assumed perfect diversity between the different sources, a property known as conditional independence. In realistic scenarios, however, this assumption is often violated, and ensemble learners based on it can be severely sub-optimal. The key challenges we address in this paper are:\ (i) how to detect, in an unsupervised manner, strong violations of conditional independence; and (ii) construct a suitable meta-learner. To this end we introduce a statistical model that allows for dependencies between classifiers. Our main contributions are the development of novel unsupervised methods to detect strongly dependent classifiers, better estimate their accuracies, and construct an improved meta-learner. Using both artificial and real datasets, we showcase the importance of taking classifier dependencies into account and the competitive performance of our approach.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

1510.0583

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.89)

Add feedback

Dynamic Filtering of Time-Varying Sparse Signals via l1 Minimization

Charles, Adam, Balavoine, Aurele, Rozell, Christopher

arXiv.org Machine LearningFeb-22-2016

Despite the importance of sparsity signal models and the increasing prevalence of high-dimensional streaming data, there are relatively few algorithms for dynamic filtering of time-varying sparse signals. Of the existing algorithms, fewer still provide strong performance guarantees. This paper examines two algorithms for dynamic filtering of sparse signals that are based on efficient l1 optimization methods. We first present an analysis for one simple algorithm (BPDN-DF) that works well when the system dynamics are known exactly. We then introduce a novel second algorithm (RWL1-DF) that is more computationally complex than BPDN-DF but performs better in practice, especially in the case where the system dynamics model is inaccurate. Robustness to model inaccuracy is achieved by using a hierarchical probabilistic data model and propagating higher-order statistics from the previous estimate (akin to Kalman filtering) in the sparse inference process. We demonstrate the properties of these algorithms on both simulated data as well as natural video sequences. Taken together, the algorithms presented in this paper represent the first strong performance analysis of dynamic filtering algorithms for time-varying sparse signals as well as state-of-the-art performance in this emerging application.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2016.2586745

1507.06145

Country: North America > United States (0.67)

Genre: Research Report (0.84)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Learning to classify with possible sensor failures

Xie, Tianpei, Nasrabadi, Nasser M., Hero, Alfred O.

arXiv.org Machine LearningFeb-22-2016

Large margin classifiers, such as the support vector machine (SVM) [1] and the maximum entropy discrimination (MED) classifier [2], have enjoyed great popularity in the signal processing and machine learning communities due to their broad applicability, robust performance, and the availability of fast software implementations. When the training data is representative of the test data, the performance of MED/SVM has theoretical guarantees that have been validated in practice [1], [3], [4]. Moreover, since the decision boundary of the MED/SVM is solely defined by a few support vectors, the algorithm can tolerate random feature distortions and perturbations. However, in many real applications, anomalous measurements are inherent to the data set due to strong environmental noise or possible sensor failures. Such anomalies arise in industrial process monitoring, video surveillance, tactical multi-modal sensing, and, more generally, any application that involves unattended sensors in difficult environments (Figure 1).

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/ICASSP.2014.6854029

1507.0454

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Predictive Entropy Search for Multi-objective Bayesian Optimization

Hernández-Lobato, Daniel, Hernández-Lobato, José Miguel, Shah, Amar, Adams, Ryan P.

arXiv.org Machine LearningFeb-21-2016

We present PESMO, a Bayesian method for identifying the Pareto set of multi-objective optimization problems, when the functions are expensive to evaluate. The central idea of PESMO is to choose evaluation points so as to maximally reduce the entropy of the posterior distribution over the Pareto set. Critically, the PESMO multi-objective acquisition function can be decomposed as a sum of objective-specific acquisition functions, which enables the algorithm to be used in \emph{decoupled} scenarios in which the objectives can be evaluated separately and perhaps with different costs. This decoupling capability also makes it possible to identify difficult objectives that require more evaluations. PESMO also offers gains in efficiency, as its cost scales linearly with the number of objectives, in comparison to the exponential cost of other methods. We compare PESMO with other related methods for multi-objective Bayesian optimization on synthetic and real-world problems. The results show that PESMO produces better recommendations with a smaller number of evaluations of the objectives, and that a decoupled evaluation can lead to improvements in performance, particularly when the number of objectives is large.

artificial intelligence, machine learning, objective, (19 more...)

arXiv.org Machine Learning

1511.05467

Country: Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

Statistical Mechanics of High-Dimensional Inference

Advani, Madhu, Ganguli, Surya

arXiv.org Machine LearningFeb-21-2016

To model modern large-scale datasets, we need efficient algorithms to infer a set of $P$ unknown model parameters from $N$ noisy measurements. What are fundamental limits on the accuracy of parameter inference, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density $\alpha = \frac{N}{P}\rightarrow \infty$. However, these classical results are not relevant to modern high-dimensional inference problems, which instead occur at finite $\alpha$. We formulate and analyze high-dimensional inference as a problem in the statistical physics of quenched disorder. Our analysis uncovers fundamental limits on the accuracy of inference in high dimensions, and reveals that widely cherished inference algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference cannot achieve these limits. We further find optimal, computationally tractable algorithms that can achieve these limits. Intriguingly, in high dimensions, these optimal algorithms become computationally simpler than MAP and ML, while still outperforming them. For example, such optimal algorithms can lead to as much as a 20% reduction in the amount of data to achieve the same performance relative to MAP. Moreover, our analysis reveals simple relations between optimal high dimensional inference and low dimensional scalar Bayesian inference, insights into the nature of generalization and predictive power in high dimensions, information theoretic limits on compressed sensing, phase transitions in quadratic inference, and connections to central mathematical objects in convex optimization theory and random matrix theory.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1103/PhysRevX.6.031034

1601.0465

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback