Goto

Collaborating Authors

 Uncertainty


Madrid UPM Advanced Statistics and Data Mining Summer School, June 26 – July 7

@machinelearnbot

The Madrid ASDM summer school is in its twelfth edition this year, with hundreds of students from all over the world having attended so far. It comprises 12 intensive (15 lecture hours) week-long courses, and a student may attend from one up to six courses. The courses cover topics such as Neural Networks and Deep Learning, Bayesian Networks, Big Data with Apache Spark, Bayesian Inference, Text Mining and Time Series, and each has theoretical as well as practical classes, done with R or python. While the summer school is mainly attended by people from academia - PhD students and researchers, people from the industry also assist. The students come from diverse backgrounds, ranging from biology to economics to mathematics and physics.


Linear, Machine Learning and Probabilistic Approaches for Time Series Analysis

@machinelearnbot

In this post, we consider different approaches for time series modeling. The forecasting approaches using linear models, ARIMA alpgorithm, XGBoost machine learning algorithm are described. Results of different model combinations are shown. For probabilistic modeling the approaches using copulas and Bayesian inference are considered. Time series analysis, especially forecasting, is an important problem of modern predictive analytics.


Frequentist Consistency of Variational Bayes

arXiv.org Machine Learning

A key challenge for modern Bayesian statistics is how to perform scalable inference of posterior distributions. To address this challenge, VB methods have emerged as a popular alternative to the classical MCMC methods. VB methods tend to be faster while achieving comparable predictive performance. However, there are few theoretical results around VB. In this paper, we establish frequentist consistency and asymptotic normality of VB methods. Specifically, we connect VB methods to point estimates based on variational approximations, called frequentist variational approximations, and we use the connection to prove a variational Bernstein-von-Mises theorem. The theorem leverages the theoretical characterizations of frequentist variational approximations to understand asymptotic properties of VB. In summary, we prove that (1) the VB posterior converges to the KL minimizer of a normal distribution, centered at the truth and (2) the corresponding variational expectation of the parameter is consistent and asymptotically normal. As applications of the theorem, we derive asymptotic properties of VB posteriors in Bayesian mixture models, Bayesian generalized linear mixed models, and Bayesian stochastic block models. We conduct a simulation study to illustrate these theoretical results.


Improving drug sensitivity predictions in precision medicine through active expert knowledge elicitation

arXiv.org Machine Learning

Predicting the efficacy of a drug for a given individual, using high-dimensional genomic measurements, is at the core of precision medicine. However, identifying features on which to base the predictions remains a challenge, especially when the sample size is small. Incorporating expert knowledge offers a promising alternative to improve a prediction model, but collecting such knowledge is laborious to the expert if the number of candidate features is very large. We introduce a probabilistic model that can incorporate expert feedback about the impact of genomic measurements on the sensitivity of a cancer cell for a given drug. We also present two methods to intelligently collect this feedback from the expert, using experimental design and multi-armed bandit models. In a multiple myeloma blood cancer data set (n=51), expert knowledge decreased the prediction error by 8%. Furthermore, the intelligent approaches can be used to reduce the workload of feedback collection to less than 30% on average compared to a naive approach.


Estimating and Controlling the False Discovery Rate for the PC Algorithm Using Edge-Specific P-Values

arXiv.org Machine Learning

The PC algorithm allows investigators to estimate a complete partially directed acyclic graph (CPDAG) from a finite dataset, but few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this paper, we introduce PC with p-values (PC-p), a fast algorithm which robustly computes edge-specific p-values and then estimates and controls the FDR across the edges. PC-p specifically uses the p-values returned by many conditional independence tests to upper bound the p-values of more complex edge-specific hypothesis tests. The algorithm then estimates and controls the FDR using the bounded p-values and the Benjamini-Yekutieli FDR procedure. Modifications to the original PC algorithm also help PC-p accurately compute the upper bounds despite non-zero Type II error rates. Experiments show that PC-p yields more accurate FDR estimation and control across the edges in a variety of CPDAGs compared to alternative methods.


Machine Learning with World Knowledge: The Position and Survey

arXiv.org Machine Learning

Machine learning has become pervasive in multiple domains, impacting a wide variety of applications, such as knowledge discovery and data mining, natural language processing, information retrieval, computer vision, social and health informatics, ubiquitous computing, etc. Two essential problems of machine learning are how to generate features and how to acquire labels for machines to learn. Particularly, labeling large amount of data for each domain-specific problem can be very time consuming and costly. It has become a key obstacle in making learning protocols realistic in applications. In this paper, we will discuss how to use the existing general-purpose world knowledge to enhance machine learning processes, by enriching the features or reducing the labeling work. We start from the comparison of world knowledge with domain-specific knowledge, and then introduce three key problems in using world knowledge in learning processes, i.e., explicit and implicit feature representation, inference for knowledge linking and disambiguation, and learning with direct or indirect supervision. Finally we discuss the future directions of this research topic.


Learning Local Dependence In Ordered Data

arXiv.org Machine Learning

In many applications, data come with a natural ordering. This ordering can often induce local dependence among nearby variables. However, in complex data, the width of this dependence may vary, making simple assumptions such as a constant neighborhood size unrealistic. We propose a framework for learning this local dependence based on estimating the inverse of the Cholesky factor of the covariance matrix. Penalized maximum likelihood estimation of this matrix yields a simple regression interpretation for local dependence in which variables are predicted by their neighbors. Our proposed method involves solving a convex, penalized Gaussian likelihood problem with a hierarchical group lasso penalty. The problem decomposes into independent subproblems which can be solved efficiently in parallel using first-order methods. Our method yields a sparse, symmetric, positive definite estimator of the precision matrix, encoding a Gaussian graphical model. We derive theoretical results not found in existing methods attaining this structure. In particular, our conditions for signed support recovery and estimation consistency rates in multiple norms are as mild as those in a regression problem. Empirical results show our method performing favorably compared to existing methods. We apply our method to genomic data to flexibly model linkage disequilibrium. Our method is also applied to improve the performance of discriminant analysis in sound recording classification.


What's the Difference Between Machine Learning Techniques?

#artificialintelligence

Artificial intelligence (AI), machine learning (ML), and robots are the sights and sounds of science fiction books and movies. Isaac Asimov's Three Laws of Robotics, first introduced in the 1942 short story "Runaround," became the backbone for his novel I, Robot and its film adaptation (Figure 1). Although we are still far away from achieving what movie producers and sci-fi writers have envisioned, the state of AI and ML has progressed significantly. AI software has also been in use for decades but advances in ML, including the use of deep neural networks (DNNs), are making headlines in application areas like self-driving cars. The movie I, Robot has robots that should be following Asimov's Three Laws of Robotics.


Is Maximum Likelihood Useful for Representation Learning?

#artificialintelligence

A few weeks ago at the DALI Theory of GANs workshop we had a great discussion about what GANs are even useful for. Pretty much everybody agreed that generating random images from a model is not really our goal. We either want to use GANs to train conditional probabilistic models (like we do for image super-resolution or speech synthesis, or something along those lines), or as a means of unsupervised representation learning. Indeed, many papers examine the latent space representations that GANs learn. But the elephant in the room is that nobody really agrees on what unsupervised representation learning really means, and why any GAN variant should be any better or worse at it than others, whether GANs or VAEs are better for that.


Measuring the non-asymptotic convergence of sequential Monte Carlo samplers using probabilistic programming

arXiv.org Machine Learning

A key limitation of sampling algorithms for approximate inference is that it is difficult to quantify their approximation error. Widely used sampling schemes, such as sequential importance sampling with resampling and Metropolis-Hastings, produce output samples drawn from a distribution that may be far from the target posterior distribution. This paper shows how to upper-bound the symmetric KL divergence between the output distribution of a broad class of sequential Monte Carlo (SMC) samplers and their target posterior distributions, subject to assumptions about the accuracy of a separate gold-standard sampler. The proposed method applies to samplers that combine multiple particles, multinomial resampling, and rejuvenation kernels. The experiments show the technique being used to estimate bounds on the divergence of SMC samplers for posterior inference in a Bayesian linear regression model and a Dirichlet process mixture model. This paper builds on a growing body of work begun by [1] and [2] into estimating upper bounds on KL divergences between a sampler's output distribution and the posterior. In variational inference, the KL divergence of the variational approximation is the gap between the variational lower bound and the log-evidence.