AITopics | Uncertainty

Collaborating Authors

Uncertainty

"AI systems–like people–must often act despite partial and uncertain information. First, the information received may be unreliable (e.g., a patient may mis-remember when a disease started, or may not have noticed a symptom that is important to a diagnosis). In addition, rules connecting real-world events can never include all the factors that might determine whether their conclusions really apply (e.g., the correctness of basing a diagnosis on a lab test depends whether there were conditions that might have caused a false positive, on the test being done correctly, on the results being associated with the right patient, etc.) Thus in order to draw useful conclusions, AI systems must be able to reason about the probability of events, given their current knowledge."
– from David Leake, Reasoning Under Uncertainty

News Overviews Instructional Materials AI-Alerts Classics

An Empirical Study of Stochastic Variational Algorithms for the Beta Bernoulli Process

Shah, Amar, Knowles, David A., Ghahramani, Zoubin

arXiv.org Machine LearningJun-26-2015

Stochastic variational inference (SVI) is emerging as the most promising candidate for scaling inference in Bayesian probabilistic models to large datasets. However, the performance of these methods has been assessed primarily in the context of Bayesian topic models, particularly latent Dirichlet allocation (LDA). Deriving several new algorithms, and using synthetic, image and genomic datasets, we investigate whether the understanding gleaned from LDA applies in the setting of sparse latent factor models, specifically beta process factor analysis (BPFA). We demonstrate that the big picture is consistent: using Gibbs sampling within SVI to maintain certain posterior dependencies is extremely effective. However, we find that different posterior dependencies are important in BPFA relative to LDA. Particularly, approximations able to model intra-local variable dependence perform best.

approximation, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

1506.0818

Country:

North America > United States (0.46)
Europe (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Collaboratively Learning Preferences from Ordinal Data

Oh, Sewoong, Thekumparampil, Kiran K., Xu, Jiaming

arXiv.org Machine LearningJun-25-2015

In applications such as recommendation systems and revenue management, it is important to predict preferences on items that have not been seen by a user or predict outcomes of comparisons among those that have never been compared. A popular discrete choice model of multinomial logit model captures the structure of the hidden preferences with a low-rank matrix. In order to predict the preferences, we want to learn the underlying model from noisy observations of the low-rank matrix, collected as revealed preferences in various forms of ordinal data. A natural approach to learn such a model is to solve a convex relaxation of nuclear norm minimization. We present the convex relaxation approach in two contexts of interest: collaborative ranking and bundled choice modeling. In both cases, we show that the convex relaxation is minimax optimal. We prove an upper bound on the resulting error with finite samples, and provide a matching information-theoretic lower bound.

artificial intelligence, machine learning, probability, (17 more...)

arXiv.org Machine Learning

1506.07947

Country: North America > United States (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Objective Variables for Probabilistic Revenue Maximization in Second-Price Auctions with Reserve

Rudolph, Maja R., Ellis, Joseph G., Blei, David M.

arXiv.org Machine LearningJun-24-2015

Many online companies sell advertisement space in second-price auctions with reserve. In this paper, we develop a probabilistic method to learn a profitable strategy to set the reserve price. We use historical auction data with features to fit a predictor of the best reserve price. This problem is delicate - the structure of the auction is such that a reserve price set too high is much worse than a reserve price set too low. To address this we develop objective variables, a new framework for combining probabilistic modeling with optimal decision-making. Objective variables are "hallucinated observations" that transform the revenue maximization task into a regularized maximum likelihood estimation problem, which we solve with an EM algorithm. This framework enables a variety of prediction mechanisms to set the reserve price. As examples, we study objective variable methods with regression, kernelized regression, and neural networks on simulated and real data. Our methods outperform previous approaches both in terms of scalability and profit.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1506.07504

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Marketing (0.34)
Information Technology > Services (0.30)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
(2 more...)

Add feedback

An objective prior that unifies objective Bayes and information-based inference

LaMont, Colin H., Wiggins, Paul A.

arXiv.org Machine LearningJun-24-2015

There are three principle paradigms of statistical inference: (i) Bayesian, (ii) information-based and (iii) frequentist inference. We describe an objective prior (the weighting or $w$-prior) which unifies objective Bayes and information-based inference. The $w$-prior is chosen to make the marginal probability an unbiased estimator of the predictive performance of the model. This definition has several other natural interpretations. From the perspective of the information content of the prior, the $w$-prior is both uniformly and maximally uninformative. The $w$-prior can also be understood to result in a uniform density of distinguishable models in parameter space. Finally we demonstrate the the $w$-prior is equivalent to the Akaike Information Criterion (AIC) for regular models in the asymptotic limit. The $w$-prior appears to be generically applicable to statistical inference and is free of {\it ad hoc} regularization. The mechanism for suppressing complexity is analogous to AIC: model complexity reduces model predictivity. We expect this new objective-Bayes approach to inference to be widely-applicable to machine-learning problems including singular models.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1506.00745

Country:

North America > United States (0.28)
Europe > Netherlands (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Graphs in machine learning: an introduction

Latouche, Pierre, Rossi, Fabrice

arXiv.org Machine LearningJun-23-2015

Graphs are commonly used to characterise interactions between objects of interest. Because they are based on a straightforward formalism, they are used in many scientific fields from computer science to historical sciences. In this paper, we give an introduction to some methods relying on graphs for learning. This includes both unsupervised and supervised methods. Unsupervised learning algorithms usually aim at visualising graphs in latent spaces and/or clustering the nodes. Both focus on extracting knowledge from graph topologies. While most existing techniques are only applicable to static graphs, where edges do not evolve through time, recent developments have shown that they could be extended to deal with evolving networks. In a supervised context, one generally aims at inferring labels or numerical values attached to nodes using both the graph and, when they are available, node characteristics. Balancing the two sources of information can be challenging, especially as they can disagree locally or globally. In both contexts, supervised and un-supervised, data can be relational (augmented with one or several global graphs) as described above, or graph valued. In this latter case, each object of interest is given as a full graph (possibly completed by other characteristics). In this context, natural tasks include graph clustering (as in producing clusters of graphs rather than clusters of nodes in a single graph), graph classification, etc. 1 Real networks One of the first practical studies on graphs can be dated back to the original work of Moreno [51] in the 30s. Since then, there has been a growing interest in graph analysis associated with strong developments in the modelling and the processing of these data. Graphs are now used in many scientific fields. In Biology [54, 2, 7], for instance, metabolic networks can describe pathways of biochemical reactions [41], while in social sciences networks are used to represent relation ties between actors [66, 56, 36, 34]. Other examples include powergrids [71] and the web [75]. Recently, networks have also been considered in other areas such as geography [22] and history [59, 39]. In machine learning, networks are seen as powerful tools to model problems in order to extract information from data and for prediction purposes. This is the object of this paper. For more complete surveys, we refer to [28, 62, 49, 45]. In this section, we introduce notations and highlight properties shared by most real networks. In Section 2, we then consider methods aiming at extracting information from a unique network. We will particularly focus on clustering methods where the goal is to find clusters of vertices. Finally, in Section 3, techniques that take a series of networks into account, where each network is

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1506.06962

Country:

North America > Canada (0.46)
North America > United States (0.28)
Europe > Belgium (0.28)

Genre: Overview (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Approximate Inference with the Variational Holder Bound

Bouchard, Guillaume, Lakshminarayanan, Balaji

arXiv.org Machine LearningJun-19-2015

We introduce the Variational Holder (VH) bound as an alternative to Variational Bayes (VB) for approximate Bayesian inference. Unlike VB which typically involves maximization of a non-convex lower bound with respect to the variational parameters, the VH bound involves minimization of a convex upper bound to the intractable integral with respect to the variational parameters. Minimization of the VH bound is a convex optimization problem; hence the VH method can be applied using off-the-shelf convex optimization algorithms and the approximation error of the VH bound can also be analyzed using tools from convex optimization literature. We present experiments on the task of integrating a truncated multivariate Gaussian distribution and compare our method to VB, EP and a state-of-the-art numerical integration method for this problem.

approximation, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1506.061

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)

Add feedback

Sampling constrained probability distributions using Spherical Augmentation

Lan, Shiwei, Shahbaba, Babak

arXiv.org Machine LearningJun-19-2015

Statistical models with constrained probability distributions are abundant in machine learning. Some examples include regression models with norm constraints (e.g., Lasso), probit, many copula models, and latent Dirichlet allocation (LDA). Bayesian inference involving probability distributions confined to constrained domains could be quite challenging for commonly used sampling algorithms. In this paper, we propose a novel augmentation technique that handles a wide range of constraints by mapping the constrained domain to a sphere in the augmented space. By moving freely on the surface of this sphere, sampling algorithms handle constraints implicitly and generate proposals that remain within boundaries when mapped back to the original space. Our proposed method, called {Spherical Augmentation}, provides a mathematically natural and computationally efficient framework for sampling from constrained probability distributions. We show the advantages of our method over state-of-the-art sampling algorithms, such as exact Hamiltonian Monte Carlo, using several examples including truncated Gaussian distributions, Bayesian Lasso, Bayesian bridge regression, reconstruction of quantized stationary Gaussian process, and LDA for topic modeling.

artificial intelligence, constraint, machine learning, (18 more...)

arXiv.org Machine Learning

1506.05936

Country: North America > United States > California (0.67)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Tensor Analysis and Fusion of Multimodal Brain Images

Karahan, Esin, Rojas-Lopez, Pedro A., Bringas-Vega, Maria L., Valdes-Hernandez, Pedro A., Valdes-Sosa, Pedro A.

arXiv.org Machine LearningJun-19-2015

Current high-throughput data acquisition technologies probe dynamical systems with different imaging modalities, generating massive data sets at different spatial and temporal resolutions posing challenging problems in multimodal data fusion. A case in point is the attempt to parse out the brain structures and networks that underpin human cognitive processes by analysis of different neuroimaging modalities (functional MRI, EEG, NIRS etc.). We emphasize that the multimodal, multi-scale nature of neuroimaging data is well reflected by a multi-way (tensor) structure where the underlying processes can be summarized by a relatively small number of components or "atoms". We introduce Markov-Penrose diagrams - an integration of Bayesian DAG and tensor network notation in order to analyze these models. These diagrams not only clarify matrix and tensor EEG and fMRI time/frequency analysis and inverse problems, but also help understand multimodal fusion via Multiway Partial Least Squares and Coupled Matrix-Tensor Factorization. We show here, for the first time, that Granger causal analysis of brain networks is a tensor regression problem, thus allowing the atomic decomposition of brain networks. Analysis of EEG and fMRI recordings shows the potential of the methods and suggests their use in other scientific domains.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Machine Learning

doi: 10.1109/JPROC.2015.2455028

1506.0604

Country:

Asia (0.67)
North America > United States > California (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
(5 more...)

Add feedback

On the accuracy of self-normalized log-linear models

Andreas, Jacob, Rabinovich, Maxim, Klein, Dan, Jordan, Michael I.

arXiv.org Machine LearningJun-18-2015

Calculation of the log-normalizer is a major computational obstacle in applications of log-linear models with large output spaces. The problem of fast normalizer computation has therefore attracted significant attention in the theoretical and applied machine learning literature. In this paper, we analyze a recently proposed technique known as "self-normalization", which introduces a regularization term in training to penalize log normalizers for deviating from zero. This makes it possible to use unnormalized model scores as approximate probabilities. Empirical evidence suggests that self-normalization is extremely effective, but a theoretical understanding of why it should work, and how generally it can be applied, is largely lacking. We prove generalization bounds on the estimated variance of normalizers and upper bounds on the loss in accuracy due to self-normalization, describe classes of input distributions that self-normalize easily, and construct explicit examples of high-variance input distributions. Our theoretical results make predictions about the difficulty of fitting self-normalized models to several classes of distributions, and we conclude with empirical validation of these predictions.

likelihood gap, log-linear model, normalizer, (15 more...)

arXiv.org Machine Learning

1506.04147

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Multi-Context Models for Reasoning under Partial Knowledge: Generative Process and Inference Grammar

Nobandegani, Ardavan Salehi, Psaromiligkos, Ioannis N.

arXiv.org Machine LearningJun-18-2015

Arriving at the complete probabilistic knowledge of a domain, i.e., learning how all variables interact, is indeed a demanding task. In reality, settings often arise for which an individual merely possesses partial knowledge of the domain, and yet, is expected to give adequate answers to a variety of posed queries. That is, although precise answers to some queries, in principle, cannot be achieved, a range of plausible answers is attainable for each query given the available partial knowledge. In this paper, we propose the Multi-Context Model (MCM), a new graphical model to represent the state of partial knowledge as to a domain. MCM is a middle ground between Probabilistic Logic, Bayesian Logic, and Probabilistic Graphical Models. For this model we discuss: (i) the dynamics of constructing a contradiction-free MCM, i.e., to form partial beliefs regarding a domain in a gradual and probabilistically consistent way, and (ii) how to perform inference, i.e., to evaluate a probability of interest involving some variables of the domain.

artificial intelligence, machine learning, mcm, (18 more...)

arXiv.org Machine Learning

1412.4271

Country: North America (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback