AITopics | Uncertainty

Collaborating Authors

Uncertainty

"AI systems–like people–must often act despite partial and uncertain information. First, the information received may be unreliable (e.g., a patient may mis-remember when a disease started, or may not have noticed a symptom that is important to a diagnosis). In addition, rules connecting real-world events can never include all the factors that might determine whether their conclusions really apply (e.g., the correctness of basing a diagnosis on a lab test depends whether there were conditions that might have caused a false positive, on the test being done correctly, on the results being associated with the right patient, etc.) Thus in order to draw useful conclusions, AI systems must be able to reason about the probability of events, given their current knowledge."
– from David Leake, Reasoning Under Uncertainty

News Overviews Instructional Materials AI-Alerts Classics

An Imprecise Probabilistic Estimator for the Transition Rate Matrix of a Continuous-Time Markov Chain

Krak, Thomas, Erreygers, Alexander, De Bock, Jasper

arXiv.org Machine LearningApr-4-2018

We consider the problem of estimating the transition rate matrix of a continuous-time Markov chain from a finite-duration realisation of this process. We approach this problem in an imprecise probabilistic framework, using a set of prior distributions on the unknown transition rate matrix. The resulting estimator is a set of transition rate matrices that, for reasons of conjugacy, is easy to find. To determine the hyperparameters for our set of priors, we reconsider the problem in discrete time, where we can use the well-known Imprecise Dirichlet Model. In particular, we show how the limit of the resulting discrete-time estimators is a continuous-time estimator. It corresponds to a specific choice of hyperparameters and has an exceptionally simple closed-form expression.

estimator, markov chain, matrix, (15 more...)

arXiv.org Machine Learning

1804.0133

Country:

Asia > Japan (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.63)

Add feedback

You Must Have Clicked on this Ad by Mistake! Data-Driven Identification of Accidental Clicks on Mobile Ads with Applications to Advertiser Cost Discounting and Click-Through Rate Prediction

Tolomei, Gabriele, Lalmas, Mounia, Farahat, Ayman, Haines, Andrew

arXiv.org Machine LearningApr-3-2018

In the cost per click (CPC) pricing model, an advertiser pays an ad network only when a user clicks on an ad; in turn, the ad network gives a share of that revenue to the publisher where the ad was impressed. Still, advertisers may be unsatisfied with ad networks charging them for "valueless" clicks, or so-called accidental clicks. [...] Charging advertisers for such clicks is detrimental in the long term as the advertiser may decide to run their campaigns on other ad networks. In addition, machine-learned click models trained to predict which ad will bring the highest revenue may overestimate an ad click-through rate, and as a consequence negatively impacting revenue for both the ad network and the publisher. In this work, we propose a data-driven method to detect accidental clicks from the perspective of the ad network. We collect observations of time spent by users on a large set of ad landing pages - i.e., dwell time. We notice that the majority of per-ad distributions of dwell time fit to a mixture of distributions, where each component may correspond to a particular type of clicks, the first one being accidental. We then estimate dwell time thresholds of accidental clicks from that component. Using our method to identify accidental clicks, we then propose a technique that smoothly discounts the advertiser's cost of accidental clicks at billing time. Experiments conducted on a large dataset of ads served on Yahoo mobile apps confirm that our thresholds are stable over time, and revenue loss in the short term is marginal. We also compare the performance of an existing machine-learned click model trained on all ad clicks with that of the same model trained only on non-accidental clicks. There, we observe an increase in both ad click-through rate (+3.9%) and revenue (+0.2%) on ads served by the Yahoo Gemini network when using the latter. [...]

artificial intelligence, information management, machine learning, (19 more...)

arXiv.org Machine Learning

1804.06912

Country: North America > United States > New York (0.15)

Genre: Research Report > Experimental Study (0.46)

Industry:

Marketing (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
(2 more...)

Add feedback

Training VAEs Under Structured Residuals

Dorta, Garoe, Vicente, Sara, Agapito, Lourdes, Campbell, Neill D. F., Simpson, Ivor

arXiv.org Machine LearningApr-3-2018

Variational auto-encoders (VAEs) are a popular and powerful deep generative model. Previous works on VAEs have assumed a factorised likelihood model, whereby the output uncertainty of each pixel is assumed to be independent. This approximation is clearly limited as demonstrated by observing a residual image from a VAE reconstruction, which often possess a high level of structure. This paper demonstrates a novel scheme to incorporate a structured Gaussian likelihood prediction network within the VAE that allows the residual correlations to be modelled. Our novel architecture, with minimal increase in complexity, incorporates the covariance matrix prediction within the VAE. We also propose a new mechanism for allowing structured uncertainty on color images. Furthermore, we provide a scheme for effectively training this model, and include some suggestions for improving performance in terms of efficiency or modelling longer range correlations. The advantage of our approach is illustrated on the CelebA face data and the LSUN outdoor churches dataset, with substantial improvements in terms of samples over traditional VAE and better reconstructions.

artificial intelligence, likelihood, machine learning, (20 more...)

arXiv.org Machine Learning

1804.0105

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Large-Scale Cox Process Inference using Variational Fourier Features

John, S. T., Hensman, James

arXiv.org Machine LearningApr-3-2018

Gaussian process modulated Poisson processes provide a flexible framework for modelling spatiotemporal point patterns. So far this had been restricted to one dimension, binning to a pre-determined grid, or small data sets of up to a few thousand data points. Here we introduce Cox process inference based on Fourier features. This sparse representation induces global rather than local constraints on the function space and is computationally efficient. This allows us to formulate a grid-free approximation that scales well with the number of data points and the size of the domain. We demonstrate that this allows MCMC approximations to the non-Gaussian posterior. We also find that, in practice, Fourier features have more consistent optimization behavior than previous approaches. Our approximate Bayesian method can fit over 100,000 events with complex spatiotemporal patterns in three dimensions on a single GPU.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1804.01016

Genre: Research Report (0.40)

Industry: Transportation > Passenger (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

How a Defense of Christianity Revolutionized Brain Science - Facts So Romantic

NautilusApr-2-2018, 09:40:06 GMT

Presbyterian reverend Thomas Bayes had no reason to suspect he'd make any lasting contribution to humankind. Born in England at the beginning of the 18th century, Bayes was a quiet and questioning man. He published only two works in his lifetime. In 1731, he wrote a defense of God's--and the British monarchy's--"divine benevolence," and in 1736, an anonymous defense of the logic of Isaac Newton's calculus. Yet an argument he wrote before his death in 1761 would shape the course of history.

artificial intelligence, brain, machine learning, (16 more...)

Nautilus

Country:

North America > United States (0.48)
Europe > United Kingdom > England (0.35)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

Neural Autoregressive Flows

Huang, Chin-Wei, Krueger, David, Lacoste, Alexandre, Courville, Aaron

arXiv.org Machine LearningApr-2-2018

Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1804.00779

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Collaborative targeted minimum loss inference from continuously indexed nuisance parameter estimators

Ju, Cheng, Chambaz, Antoine, van der Laan, Mark J.

arXiv.org Machine LearningMar-30-2018

Suppose that we wish to infer the value of a statistical parameter at a law from which we sample independent observations. Suppose that this parameter is smooth and that we can define two variation-independent, infinite-dimensional features of the law, its so called Q- and G-components (comp.), such that if we estimate them consistently at a fast enough product of rates, then we can build a confidence interval (CI) with a given asymptotic level based on a plain targeted minimum loss estimator (TMLE). The estimators of the Q- and G-comp. would typically be by products of machine learning algorithms. We focus on the case that the machine learning algorithm for the G-comp. is fine-tuned by a real-valued parameter h. Then, a plain TMLE with an h chosen by cross-validation would typically not lend itself to the construction of a CI, because the selection of h would trade-off its empirical bias with something akin to the empirical variance of the estimator of the G-comp. as opposed to that of the TMLE. A collaborative TMLE (C-TMLE) might, however, succeed in achieving the relevant trade-off. We construct a C-TMLE and show that, under high-level empirical processes conditions, and if there exists an oracle h that makes a bulky remainder term asymptotically Gaussian, then the C-TMLE is asymptotically Gaussian hence amenable to building a CI provided that its asymptotic variance can be estimated too. We illustrate the construction and main result with the inference of the average treatment effect, where the Q-comp. consists in a marginal law and a conditional expectation, and the G-comp. is a propensity score (a conditional probability). We also conduct a multi-faceted simulation study to investigate the empirical properties of the collaborative TMLE when the G-comp. is estimated by the LASSO. Here, h is the bound on the l1-norm of the candidate coefficients.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

1804.00102

Country:

North America > United States (0.45)
Europe > Austria (0.27)

Genre: Research Report > Experimental Study (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Mode-Seeking Clustering and Density Ridge Estimation via Direct Estimation of Density-Derivative-Ratios

Sasaki, Hiroaki, Kanamori, Takafumi, Hyvärinen, Aapo, Niu, Gang, Sugiyama, Masashi

arXiv.org Machine LearningMar-30-2018

Modes and ridges of the probability density function behind observed data are useful geometric features. Mode-seeking clustering assigns cluster labels by associating data samples with the nearest modes, and estimation of density ridges enables us to find lower-dimensional structures hidden in data. A key technical challenge both in mode-seeking clustering and density ridge estimation is accurate estimation of the ratios of the first- and second-order density derivatives to the density. A naive approach takes a three-step approach of first estimating the data density, then computing its derivatives, and finally taking their ratios. However, this three-step approach can be unreliable because a good density estimator does not necessarily mean a good density derivative estimator, and division by the estimated density could significantly magnify the estimation error. To cope with these problems, we propose a novel estimator for the \emph{density-derivative-ratios}. The proposed estimator does not involve density estimation, but rather \emph{directly} approximates the ratios of density derivatives of any order. Moreover, we establish a convergence rate of the proposed estimator. Based on the proposed estimator, novel methods both for mode-seeking clustering and density ridge estimation are developed, and the respective convergence rates to the mode and ridge of the underlying density are also established. Finally, we experimentally demonstrate that the developed methods significantly outperform existing methods, particularly for relatively high-dimensional data.

artificial intelligence, estimation, machine learning, (17 more...)

arXiv.org Machine Learning

1707.01711

Country:

Europe > United Kingdom (0.46)
North America > United States (0.45)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)

Add feedback

Notes on computational-to-statistical gaps: predictions using statistical physics

Bandeira, Afonso S., Perry, Amelia, Wein, Alexander S.

arXiv.org Machine LearningMar-29-2018

In these notes we describe heuristics to predict computational-to-statistical gaps in certain statistical problems. These are regimes in which the underlying statistical problem is information-theoretically possible although no efficient algorithm exists, rendering the problem essentially unsolvable for large instances. The methods we describe here are based on mature, albeit non-rigorous, tools from statistical physics. These notes are based on a lecture series given by the authors at the Courant Institute of Mathematical Sciences in New York City, on May 16th, 2017.

algorithm, artificial intelligence, bayesian inference, (14 more...)

arXiv.org Machine Learning

1803.11132

Country:

North America > United States > New York (0.24)
North America > United States > Massachusetts (0.14)

Genre:

Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.54)

Industry:

Energy > Oil & Gas (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Copula Variational Bayes inference via information geometry

Tran, Viet Hung

arXiv.org Machine LearningMar-29-2018

Variational Bayes (VB), also known as independent mean-field approximation, has become a popular method for Bayesian network inference in recent years. Its application is vast, e.g. in neural network, compressed sensing, clustering, etc. to name just a few. In this paper, the independence constraint in VB will be relaxed to a conditional constraint class, called copula in statistics. Since a joint probability distribution always belongs to a copula class, the novel copula VB (CVB) approximation is a generalized form of VB. Via information geometry, we will see that CVB algorithm iteratively projects the original joint distribution to a copula constraint space until it reaches a local minimum Kullback-Leibler (KL) divergence. By this way, all mean-field approximations, e.g. iterative VB, Expectation-Maximization (EM), Iterated Conditional Mode (ICM) and k-means algorithms, are special cases of CVB approximation. For a generic Bayesian network, an augmented hierarchy form of CVB will also be designed. While mean-field algorithms can only return a locally optimal approximation for a correlated network, the augmented CVB network, which is an optimally weighted average of a mixture of simpler network structures, can potentially achieve the globally optimal approximation for the first time. Via simulations of Gaussian mixture clustering, the classification's accuracy of CVB will be shown to be far superior to that of state-of-the-art VB, EM and k-means algorithms.

approximation, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1803.10998

Country:

Europe (0.92)
North America > United States (0.45)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback