Goto

Collaborating Authors

 Corander, Jukka


T-SNE Is Not Optimized to Reveal Clusters in Data

arXiv.org Machine Learning

The rapid growth in the amount of data processed by analysts demands more efficient information digestion and communication methods. Data visualization by dimensionality reduction facilitates a viewer to digest information in massive data sets quickly. Therefore, it is increasingly applied as a critical component in scientific research, digital libraries, data mining, financial data analysis, market studies, manufacturing production control, drug discovery, etc. Stochastic Neighbor Embedding (SNE) [4] is a widely used nonlinear dimensionality reduction (NLDR) method, which approximately preserves the pairwise probabilities of being neighbors (neighboring probabilities for short) in the input space. In particular, the Student t-Distributed Stochastic Neighbor Embedding (t-SNE) [9] has become one of the most popular nonlinear dimensionality reduction methods for data visualization. The t-SNE method employs a heavy-tailed distribution for the neighboring probabilities in the embedding and minimizes their Kullback-Leibler divergence against the precomputed input probabilities.


Approximate Bayesian inference from noisy likelihoods with Gaussian process emulated MCMC

arXiv.org Machine Learning

We present an efficient approach for doing approximate Bayesian inference when only a limited number of noisy likelihood evaluations can be obtained due to computational constraints, which is becoming increasingly common for applications of complex models. Our main methodological innovation is to model the log-likelihood function using a Gaussian process (GP) in a local fashion and apply this model to emulate the progression that an exact Metropolis-Hastings (MH) algorithm would take if it was applicable. New log-likelihood evaluation locations are selected using sequential experimental design strategies such that each MH accept/reject decision is done within a pre-specified error tolerance. The resulting approach is conceptually simple and sample-efficient as it takes full advantage of the GP model. It is also more robust to violations of GP modelling assumptions and better suited for the typical situation where the posterior is substantially more concentrated than the prior, compared with various existing inference methods based on global GP surrogate modelling. We discuss the probabilistic interpretations and central theoretical aspects of our approach, and we then demonstrate the benefits of the resulting algorithm in the context of likelihood-free inference for simulator-based statistical models.


Likelihood-Free Inference with Deep Gaussian Processes

arXiv.org Machine Learning

In recent years, surrogate models have been successfully used in likelihood-free inference to decrease the number of simulator evaluations. The current state-of-the-art performance for this task has been achieved by Bayesian Optimization with Gaussian Processes (GPs). While this combination works well for unimodal target distributions, it is restricting the flexibility and applicability of Bayesian Optimization for accelerating likelihood-free inference more generally. We address this problem by proposing a Deep Gaussian Process (DGP) surrogate model that can handle more irregularly behaved target distributions. Our experiments show how DGPs can outperform GPs on objective functions with multimodal distributions and maintain a comparable performance in unimodal cases. This confirms that DGPs as surrogate models can extend the applicability of Bayesian Optimization for likelihood-free inference (BOLFI), while adding computational overhead that remains negligible for computationally intensive simulators.


High-dimensional structure learning of binary pairwise Markov networks: A comparative numerical study

arXiv.org Machine Learning

Learning the undirected graph structure of a Markov network from data is a problem that has received a lot of attention during the last few decades. As a result of the general applicability of the model class, a myriad of methods have been developed in parallel in several research fields. Recently, as the size of the considered systems has increased, the focus of new methods has been shifted towards the high-dimensional domain. In particular, the introduction of the pseudo-likelihood function has pushed the limits of score-based methods originally based on the likelihood. At the same time, an array of methods based on simple pairwise tests have been developed to meet the challenges set by the increasingly large data sets in computational biology. Apart from being applicable on high-dimensional problems, methods based on the pseudo-likelihood and pairwise tests are fundamentally very different. In this work, we perform an extensive numerical study comparing the different types of methods on data generated by binary pairwise Markov networks. For sampling large networks, we use a parallelizable Gibbs sampler based on sparse restricted Boltzmann machines. Our results show that pairwise methods can be more accurate than pseudo-likelihood methods in settings often encountered in high-dimensional structure learning.


Likelihood-free inference by ratio estimation

arXiv.org Machine Learning

We consider the problem of parametric statistical inference when likelihood computations are prohibitively expensive but sampling from the model is possible. Several so-called likelihood-free methods have been developed to perform inference in the absence of a likelihood function. The popular synthetic likelihood approach infers the parameters by modelling summary statistics of the data by a Gaussian probability distribution. In another popular approach called approximate Bayesian computation, the inference is performed by identifying parameter values for which the summary statistics of the simulated data are close to those of the observed data. Synthetic likelihood is easier to use as no measure of "closeness" is required but the Gaussianity assumption is often limiting. Moreover, both approaches require judiciously chosen summary statistics. We here present an alternative inference approach that is as easy to use as synthetic likelihood but not as restricted in its assumptions, and that, in a natural way, enables automatic selection of relevant summary statistic from a large set of candidates. The basic idea is to frame the problem of estimating the posterior as a problem of estimating the ratio between the data generating distribution and the marginal distribution. This problem can be solved by logistic regression, and including regularising penalty terms enables automatic selection of the summary statistics relevant to the inference task. We illustrate the general theory on toy problems and use it to perform inference for stochastic nonlinear dynamical systems.


ELFI: Engine for Likelihood Free Inference

arXiv.org Machine Learning

The Engine for Likelihood-Free Inference (ELFI) is a Python software library for performing likelihood-free inference (LFI). ELFI provides a convenient syntax for specifying LFI models commonly composed of priors, simulators, summaries, distances and other custom operations. These can be implemented in a wide variety of languages. Separating the modelling task from the inference makes it possible to use the same model with any of the available inference methods without modifications. A central method implemented in ELFI is Bayesian Optimization for Likelihood-Free Inference (BOLFI), which has recently been shown to accelerate likelihood-free inference up to several orders of magnitude. ELFI also has an inbuilt support for output data storing for reuse and analysis, and supports parallelization of computation from multiple cores up to a cluster environment. ELFI is designed to be extensible and provides interfaces for widening its functionality. This makes addition of new inference methods to ELFI straightforward and automatically compatible with the inbuilt features.


Likelihood-free inference via classification

arXiv.org Machine Learning

Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding values that yield simulated data resembling the observed data. While widely applicable, a major difficulty in this framework is how to measure the discrepancy between the simulated and observed data. Transforming the original problem into a problem of classifying the data into simulated versus observed, we find that classification accuracy can be used to assess the discrepancy. The complete arsenal of classification methods becomes thereby available for inference of intractable generative models. We validate our approach using theory and simulations for both point estimation and Bayesian inference, and demonstrate its use on real data by inferring an individual-based epidemiological model for bacterial infections in child care centers.


Inferring Cognitive Models from Data using Approximate Bayesian Computation

arXiv.org Machine Learning

An important problem for HCI researchers is to estimate the parameter values of a cognitive model from behavioral data. This is a difficult problem, because of the substantial complexity and variety in human behavioral strategies. We report an investigation into a new approach using approximate Bayesian computation (ABC) to condition model parameters to data and prior knowledge. As the case study we examine menu interaction, where we have click time data only to infer a cognitive model that implements a search behaviour with parameters such as fixation duration and recall probability. Our results demonstrate that ABC (i) improves estimates of model parameter values, (ii) enables meaningful comparisons between model variants, and (iii) supports fitting models to individual users. ABC provides ample opportunities for theoretical HCI research by allowing principled inference of model parameter values and their uncertainty.


Fast k-NN search

arXiv.org Machine Learning

Efficient index structures for fast approximate nearest neighbor queries are required in many applications such as recommendation systems. In high-dimensional spaces, many conventional methods suffer from excessive usage of memory and slow response times. We propose a method where multiple random projection trees are combined by a novel voting scheme. The key idea is to exploit the redundancy in a large number of candidate sets obtained by independently generated random projections in order to reduce the number of expensive exact distance evaluations. The method is straightforward to implement using sparse projections which leads to a reduced memory footprint and fast index construction. Furthermore, it enables grouping of the required computations into big matrix multiplications, which leads to additional savings due to cache effects and low-level parallelization. We demonstrate by extensive experiments on a wide variety of data sets that the method is faster than existing partitioning tree or hashing based approaches, making it the fastest available technique on high accuracy levels.


On the inconsistency of $\ell_1$-penalised sparse precision matrix estimation

arXiv.org Machine Learning

Various $\ell_1$-penalised estimation methods such as graphical lasso and CLIME are widely used for sparse precision matrix estimation. Many of these methods have been shown to be consistent under various quantitative assumptions about the underlying true covariance matrix. Intuitively, these conditions are related to situations where the penalty term will dominate the optimisation. In this paper, we explore the consistency of $\ell_1$-based methods for a class of sparse latent variable -like models, which are strongly motivated by several types of applications. We show that all $\ell_1$-based methods fail dramatically for models with nearly linear dependencies between the variables. We also study the consistency on models derived from real gene expression data and note that the assumptions needed for consistency never hold even for modest sized gene networks and $\ell_1$-based methods also become unreliable in practice for larger networks.