AITopics | Uncertainty

Collaborating Authors

Uncertainty

"AI systems–like people–must often act despite partial and uncertain information. First, the information received may be unreliable (e.g., a patient may mis-remember when a disease started, or may not have noticed a symptom that is important to a diagnosis). In addition, rules connecting real-world events can never include all the factors that might determine whether their conclusions really apply (e.g., the correctness of basing a diagnosis on a lab test depends whether there were conditions that might have caused a false positive, on the test being done correctly, on the results being associated with the right patient, etc.) Thus in order to draw useful conclusions, AI systems must be able to reason about the probability of events, given their current knowledge."
– from David Leake, Reasoning Under Uncertainty

News Overviews Instructional Materials AI-Alerts Classics

Statistical inference using SGD

Li, Tianyang, Liu, Liu, Kyrillidis, Anastasios, Caramanis, Constantine

arXiv.org Machine LearningNov-19-2017

We present a novel method for frequentist statistical inference in $M$-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.

artificial intelligence, bayesian inference, machine learning, (14 more...)

arXiv.org Machine Learning

1705.07477

Genre: Research Report > New Finding (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

The amazing predictive power of conditional probability in Bayes Nets

@machinelearnbotNov-18-2017, 15:35:12 GMT

Using conditional probability gives Bayes Nets strong analytical advantages over traditional regression-based models. This adds to several advantages we discussed in an earlier article. But what is conditional probability and what makes it different? In short, conditional probability means that the effects of one variable depend on, of flow from, the distribution of another variable (or others). The complete state of one variable determines how another acts.

artificial intelligence, bayesian inference, machine learning, (13 more...)

@machinelearnbot

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Robust Synthetic Control

Amjad, Muhammad Jehangir, Shah, Devavrat, Shen, Dennis

arXiv.org Machine LearningNov-18-2017

We present a robust generalization of the synthetic control method for comparative case studies. Like the classical method, we present an algorithm to estimate the unobservable counterfactual of a treatment unit. A distinguishing feature of our algorithm is that of de-noising the data matrix via singular value thresholding, which renders our approach robust in multiple facets: it automatically identifies a good subset of donors, overcomes the challenges of missing data, and continues to work well in settings where covariate information may not be provided. To begin, we establish the condition under which the fundamental assumption in synthetic control-like approaches holds, i.e. when the linear relationship between the treatment unit and the donor pool prevails in both the pre- and post-intervention periods. We provide the first finite sample analysis for a broader class of models, the Latent Variable Model, in contrast to Factor Models previously considered in the literature. Further, we show that our de-noising procedure accurately imputes missing entries, producing a consistent estimator of the underlying signal matrix provided $p = \Omega( T^{-1 + \zeta})$ for some $\zeta > 0$; here, $p$ is the fraction of observed data and $T$ is the time interval of interest. Under the same setting, we prove that the mean-squared-error (MSE) in our prediction estimation scales as $O(\sigma^2/p + 1/\sqrt{T})$, where $\sigma^2$ is the noise variance. Using a data aggregation method, we show that the MSE can be made as small as $O(T^{-1/2+\gamma})$ for any $\gamma \in (0, 1/2)$, leading to a consistent estimator. We also introduce a Bayesian framework to quantify the model uncertainty through posterior probabilities. Our experiments, using both real-world and synthetic datasets, demonstrate that our robust generalization yields an improvement over the classical synthetic control method.

artificial intelligence, machine learning, singular value, (17 more...)

arXiv.org Machine Learning

1711.0694

Country:

Europe > Spain (0.68)
North America > United States > California (0.15)

Genre: Research Report > New Finding (0.46)

Industry:

Government (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
Law (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

General Bayesian Inference over the Stiefel Manifold via the Givens Transform

Pourzanjani, Arya A, Jiang, Richard M, Mitchell, Brian, Atzberger, Paul J, Petzold, Linda R

arXiv.org Machine LearningNov-17-2017

We introduce the Givens Transform, a novel transform between the space of orthonormal matrices and $\mathbb{R}^D$. The Givens Transform allows for the application of any general Bayesian inference algorithm to probabilistic models containing constrained unit-vectors or orthonormal matrix parameters. This includes a variety of matrix factorizations and dimensionality reduction models such as Probabilistic PCA (PPCA), Exponential Family PPCA (BXPCA), and Canonical Correlation Analysis (CCA). While previous Bayesian approaches to these models relied on separate sampling update rules for constrained and unconstrained parameters, the Givens Transform enables the treatment of unit-vectors and orthonormal matrices agnostically as unconstrained parameters. Thus any Bayesian inference algorithm can be used on these models without modification. This opens the door to not just sampling algorithms, but Variational Inference (VI) as well. We illustrate with several examples and supplied code, how the Givens Transform allows end-users to easily build complex models in their favorite Bayesian modeling framework such as Stan, Edward, or PyMC3, a task that was previously intractable due to technical constraints.

artificial intelligence, machine learning, stiefel manifold, (14 more...)

arXiv.org Machine Learning

1710.09443

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)

Add feedback

Rate-Distortion Bounds on Bayes Risk in Supervised Learning

Nokleby, Matthew, Beirami, Ahmad, Calderbank, Robert

arXiv.org Machine LearningNov-17-2017

We present an information-theoretic framework for bounding the number of labeled samples needed to train a classifier in a parametric Bayesian setting. We derive bounds on the average $L_p$ distance between the learned classifier and the true maximum a posteriori classifier, which are well-established surrogates for the excess classification error due to imperfect learning. We provide lower and upper bounds on the rate-distortion function, using $L_p$ loss as the distortion measure, of a maximum a priori classifier in terms of the differential entropy of the posterior distribution and a quantity called the interpolation dimension, which characterizes the complexity of the parametric distribution family. In addition to expressing the information content of a classifier in terms of lossy compression, the rate-distortion function also expresses the minimum number of bits a learning machine needs to extract from training data to learn a classifier to within a specified $L_p$ tolerance. We use results from universal source coding to express the information content in the training data in terms of the Fisher information of the parametric family and the number of training samples available. The result is a framework for computing lower bounds on the Bayes $L_p$ risk. This framework complements the well-known probably approximately correct (PAC) framework, which provides minimax risk bounds involving the Vapnik-Chervonenkis dimension or Rademacher complexity. Whereas the PAC framework provides upper bounds the risk for the worst-case data distribution, the proposed rate-distortion framework lower bounds the risk averaged over the data distribution. We evaluate the bounds for a variety of data models, including categorical, multinomial, and Gaussian models. In each case the bounds are provably tight orderwise, and in two cases we prove that the bounds are tight up to multiplicative constants.

artificial intelligence, interpolation, machine learning, (18 more...)

arXiv.org Machine Learning

1605.02268

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

A primer on universal function approximation with deep learning (in Torch and R)

@machinelearnbotNov-16-2017, 12:21:14 GMT

Arthur C. Clarke famously stated that "any sufficiently advanced technology is indistinguishable from magic." No current technology embodies this statement more than neural networks and deep learning. And like any good magic it not only dazzles and inspires but also puts fear into people's hearts. One known property of artificial neural networks (ANNs) is that they are universal function approximators. This means that any mathematical function can be represented by a neural network.

artificial intelligence, machine learning, survey article, (19 more...)

@machinelearnbot

Genre: Overview (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.43)

Add feedback

HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation

Xu, Qianqian, Xiong, Jiechao, Chen, Xi, Huang, Qingming, Yao, Yuan

arXiv.org Machine LearningNov-16-2017

Recently, crowdsourcing has emerged as an effective paradigm for human-powered large scale problem solving in various domains. However, task requester usually has a limited amount of budget, thus it is desirable to have a policy to wisely allocate the budget to achieve better quality. In this paper, we study the principle of information maximization for active sampling strategies in the framework of HodgeRank, an approach based on Hodge Decomposition of pairwise ranking data with multiple workers. The principle exhibits two scenarios of active sampling: Fisher information maximization that leads to unsupervised sampling based on a sequential maximization of graph algebraic connectivity without considering labels; and Bayesian information maximization that selects samples with the largest information gain from prior to posterior, which gives a supervised sampling involving the labels collected. Experiments show that the proposed methods boost the sampling efficiency as compared to traditional sampling schemes and are thus valuable to practical crowdsourcing experiments.

artificial intelligence, machine learning, social media, (17 more...)

arXiv.org Machine Learning

1711.05957

Country:

North America > United States (0.93)
Asia (0.69)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Neural Variational Inference and Learning in Undirected Graphical Models

Kuleshov, Volodymyr, Ermon, Stefano

arXiv.org Machine LearningNov-16-2017

Many problems in machine learning are naturally expressed in the language of undirected graphical models. Here, we propose black-box learning and inference algorithms for undirected models that optimize a variational approximation to the log-likelihood of the model. Central to our approach is an upper bound on the log-partition function parametrized by a function q that we express as a flexible neural network. Our bound makes it possible to track the partition function during learning, to speed-up sampling, and to train a broad class of hybrid directed/undirected models via a unified variational inference framework. We empirically demonstrate the effectiveness of our method on several popular generative modeling datasets.

artificial intelligence, machine learning, undirected model, (16 more...)

arXiv.org Machine Learning

1711.02679

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Hinge-Loss Markov Random Fields and Probabilistic Soft Logic

Bach, Stephen H., Broecheler, Matthias, Huang, Bert, Getoor, Lise

arXiv.org Artificial IntelligenceNov-16-2017

A fundamental challenge in developing high-impact machine learning technologies is balancing the need to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge graphs and the Web, to images, video, and natural language. In this paper, we introduce two new formalisms for modeling structured data, and show that they can both capture rich structure and scale to big data. The first, hinge-loss Markov random fields (HL-MRFs), is a new kind of probabilistic graphical model that generalizes different approaches to convex inference. We unite three approaches from the randomized algorithms, probabilistic graphical models, and fuzzy logic communities, showing that all three lead to the same inference objective. We then define HL-MRFs by generalizing this unified objective. The second new formalism, probabilistic soft logic (PSL), is a probabilistic programming language that makes HL-MRFs easy to define using a syntax based on first-order logic. We introduce an algorithm for inferring most-probable variable assignments (MAP inference) that is much more scalable than general-purpose convex optimization methods, because it uses message passing to take advantage of sparse dependency structures. We then show how to learn the parameters of HL-MRFs. The learned HL-MRFs are as accurate as analogous discrete models, but much more scalable. Together, these algorithms enable HL-MRFs and PSL to model rich, structured data at scales not previously possible.

constraint, logic & formal reasoning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1505.04406

Country: North America > United States > California (0.67)

Genre: Research Report > New Finding (0.92)

Industry:

Education > Educational Setting > Online (0.92)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
(3 more...)

Add feedback

AI and Automation: 10-point guide to its impact on KYC Process in Banks

#artificialintelligenceNov-15-2017, 09:09:50 GMT

The term Artificial Intelligence (AI) has been around for a while. A quick search on the web reveals that the field of modern AI was born in the year 1950, when Alan Turing published a paper on thinking machines. Here we are, almost seven decades later, still in the advent of this emerging technology. Over the last few years, Google CEO, Sundar Pitchai has been speaking about the increasing role of AI in software and it seems like this year might be the inflection point for the field. In May 2017, Pichai explained how at Google, it is an "AI-first" approach for several of its products.

artificial intelligence, machine learning, natural language, (18 more...)

#artificialintelligence

Industry:

Banking & Finance (1.00)
Information Technology > Security & Privacy (0.69)
Information Technology > Services (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.30)

Add feedback