Uncertainty
Learning the intensity of time events with change-points
Alaya, Mokhtar Zahdi, Gaรฏffas, Stรฉphane, Guilloux, Agathe
We consider the problem of learning the inhomogeneous intensity of a counting process, under a sparse segmentation assumption. We introduce a weighted total-variation penalization, using data-driven weights that correctly scale the penalization along the observation interval. We prove that this leads to a sharp tuning of the convex relaxation of the segmentation prior, by stating oracle inequalities with fast rates of convergence, and consistency for change-points detection. This provides first theoretical guarantees for segmentation with a convex proxy beyond the standard i.i.d signal + white noise setting. We introduce a fast algorithm to solve this convex problem. Numerical experiments illustrate our approach on simulated and on a high-frequency genomics dataset.
Classical vs. Bayesian methods for linear system identification: point estimators and confidence sets
Romeres, D., Prando, G., Pillonetto, G., Chiuso, A.
This paper compares classical parametric methods with recently developed Bayesian methods for system identification. A Full Bayes solution is considered together with one of the standard approximations based on the Empirical Bayes paradigm. Results regarding point estimators for the impulse response as well as for confidence regions are reported.
Identification of stable models via nonparametric prediction error methods
Romeres, Diego, Pillonetto, Gianluigi, Chiuso, Alessandro
A new Bayesian approach to linear system identification has been proposed in a series of recent papers. The main idea is to frame linear system identification as predictor estimation in an infinite dimensional space, with the aid of regularization/Bayesian techniques. This approach guarantees the identification of stable predictors based on the prediction error minimization. Unluckily, the stability of the predictors does not guarantee the stability of the impulse response of the system. In this paper we propose and compare various techniques to address this issue. Simulations results comparing these techniques will be provided.
Exploring Algorithmic Limits of Matrix Rank Minimization under Affine Constraints
Many applications require recovering a matrix of minimal rank within an affine constraint set, with matrix completion a notable special case. Because the problem is NP-hard in general, it is common to replace the matrix rank with the nuclear norm, which acts as a convenient convex surrogate. While elegant theoretical conditions elucidate when this replacement is likely to be successful, they are highly restrictive and convex algorithms fail when the ambient rank is too high or when the constraint set is poorly structured. Non-convex alternatives fare somewhat better when carefully tuned; however, convergence to locally optimal solutions remains a continuing source of failure. Against this backdrop we derive a deceptively simple and parameter-free probabilistic PCA-like algorithm that is capable, over a wide battery of empirical tests, of successful recovery even at the theoretical limit where the number of measurements equal the degrees of freedom in the unknown low-rank matrix. Somewhat surprisingly, this is possible even when the affine constraint set is highly ill-conditioned. While proving general recovery guarantees remains evasive for non-convex algorithms, Bayesian-inspired or otherwise, we nonetheless show conditions whereby the underlying cost function has a unique stationary point located at the global optimum; no existing cost function we are aware of satisfies this same property. We conclude with a simple computer vision application involving image rectification and a standard collaborative filtering benchmark.
On the Equivalence of Factorized Information Criterion Regularization and the Chinese Restaurant Process Prior
Factorized Information Criterion (FIC) is a recently developed information criterion, based on which a novel model selection methodology, namely Factorized Asymptotic Bayesian (FAB) Inference, has been developed and successfully applied to various hierarchical Bayesian models. The Dirichlet Process (DP) prior, and one of its well known representations, the Chinese Restaurant Process (CRP), derive another line of model selection methods. FIC can be viewed as a prior distribution over the latent variable configurations. Under this view, we prove that when the parameter dimensionality $D_{c}=2$, FIC is equivalent to CRP. We argue that when $D_{c}>2$, FIC avoids an inherent problem of DP/CRP, i.e. the data likelihood will dominate the impact of the prior, and thus the model selection capability will weaken as $D_{c}$ increases. However, FIC overestimates the data likelihood. As a result, FIC may be overly biased towards models with less components. We propose a natural generalization of FIC, which finds a middle ground between CRP and FIC, and may yield more accurate model selection results than FIC.
Divide-and-Conquer with Sequential Monte Carlo
Lindsten, Fredrik, Johansen, Adam M., Naesseth, Christian A., Kirkpatrick, Bonnie, Schรถn, Thomas B., Aston, John, Bouchard-Cรดtรฉ, Alexandre
We propose a novel class of Sequential Monte Carlo (SMC) algorithms, appropriate for inference in probabilistic graphical models. This class of algorithms adopts a divide-and-conquer approach based upon an auxiliary tree-structured decomposition of the model of interest, turning the overall inferential task into a collection of recursively solved sub-problems. The proposed method is applicable to a broad class of probabilistic graphical models, including models with loops. Unlike a standard SMC sampler, the proposed Divide-and-Conquer SMC employs multiple independent populations of weighted particles, which are resampled, merged, and propagated as the method progresses. We illustrate empirically that this approach can outperform standard methods in terms of the accuracy of the posterior expectation and marginal likelihood approximations. Divide-and-Conquer SMC also opens up novel parallel implementation options and the possibility of concentrating the computational effort on the most challenging sub-problems. We demonstrate its performance on a Markov random field and on a hierarchical logistic regression problem.
Probabilistic Inference Techniques for Scalable Multiagent Decision Making
Kumar, Akshat, Zilberstein, Shlomo, Toussaint, Marc
Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models---NEXP-Complete even for two agents---has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques.
Non-Normal Mixtures of Experts
Mixture of Experts (MoE) is a popular framework for modeling heterogeneity in data for regression, classification and clustering. For continuous data which we consider here in the context of regression and cluster analysis, MoE usually use normal experts, that is, expert components following the Gaussian distribution. However, for a set of data containing a group or groups of observations with asymmetric behavior, heavy tails or atypical observations, the use of normal experts may be unsuitable and can unduly affect the fit of the MoE model. In this paper, we introduce new non-normal mixture of experts (NNMoE) which can deal with these issues regarding possibly skewed, heavy-tailed data and with outliers. The proposed models are the skew-normal MoE and the robust $t$ MoE and skew $t$ MoE, respectively named SNMoE, TMoE and STMoE. We develop dedicated expectation-maximization (EM) and expectation conditional maximization (ECM) algorithms to estimate the parameters of the proposed models by monotonically maximizing the observed data log-likelihood. We describe how the presented models can be used in prediction and in model-based clustering of regression data. Numerical experiments carried out on simulated data show the effectiveness and the robustness of the proposed models in terms modeling non-linear regression functions as well as in model-based clustering. Then, to show their usefulness for practical applications, the proposed models are applied to the real-world data of tone perception for musical data analysis, and the one of temperature anomalies for the analysis of climate change data.
Nonparametric Estimation of Band-limited Probability Density Functions
Agarwal, Rahul, Chen, Zhe, Sarma, Sridevi V.
In this paper, a nonparametric maximum likelihood (ML) estimator for band-limited (BL) probability density functions (pdfs) is proposed. The BLML estimator is consistent and computationally efficient. To compute the BLML estimator, three approximate algorithms are presented: a binary quadratic programming (BQP) algorithm for medium scale problems, a Trivial algorithm for large-scale problems that yields a consistent estimate if the underlying pdf is strictly positive and BL, and a fast implementation of the Trivial algorithm that exploits the band-limited assumption and the Nyquist sampling theorem ("BLMLQuick"). All three BLML estimators outperform kernel density estimation (KDE) algorithms (adaptive and higher order KDEs) with respect to the mean integrated squared error for data generated from both BL and infinite-band pdfs. Further, the BLMLQuick estimate is remarkably faster than the KD algorithms. Finally, the BLML method is applied to estimate the conditional intensity function of a neuronal spike train (point process) recorded from a rat's entorhinal cortex grid cell, for which it outperforms state-of-the-art estimators used in neuroscience.
Modelling of directional data using Kent distributions
The modelling of data on a spherical surface requires the consideration of directional probability distributions. To model asymmetrically distributed data on a three-dimensional sphere, Kent distributions are often used. The moment estimates of the parameters are typically used in modelling tasks involving Kent distributions. However, these lack a rigorous statistical treatment. The focus of the paper is to introduce a Bayesian estimation of the parameters of the Kent distribution which has not been carried out in the literature, partly because of its complex mathematical form. We employ the Bayesian information-theoretic paradigm of Minimum Message Length (MML) to bridge this gap and derive reliable estimators. The inferred parameters are subsequently used in mixture modelling of Kent distributions. The problem of inferring the suitable number of mixture components is also addressed using the MML criterion. We demonstrate the superior performance of the derived MML-based parameter estimates against the traditional estimators. We apply the MML principle to infer mixtures of Kent distributions to model empirical data corresponding to protein conformations. We demonstrate the effectiveness of Kent models to act as improved descriptors of protein structural data as compared to commonly used von Mises-Fisher distributions.