Uncertainty
An Introduction to the Practical and Theoretical Aspects of Mixture-of-Experts Modeling
Nguyen, Hien D., Chamroukhi, Faicel
Mixture-of-experts (MoE) models are a powerful paradigm for modeling of data arising from complex data generating processes (DGPs). In this article, we demonstrate how different MoE models can be constructed to approximate the underlying DGPs of arbitrary types of data. Due to the probabilistic nature of MoE models, we propose the maximum quasi-likelihood (MQL) estimator as a method for estimating MoE model parameters from data, and we provide conditions under which MQL estimators are consistent and asymptotically normal. The blockwise minorization-maximizatoin (blockwise-MM) algorithm framework is proposed as an all-purpose method for constructing algorithms for obtaining MQL estimators. An example derivation of a blockwise-MM algorithm is provided. We then present a method for constructing information criteria for estimating the number of components in MoE models and provide justification for the classic Bayesian information criterion (BIC). We explain how MoE models can be used to conduct classification, clustering, and regression and we illustrate these applications via a pair of worked examples.
Post-Inference Prior Swapping
Neiswanger, Willie, Xing, Eric
While Bayesian methods are praised for their ability to incorporate useful prior knowledge, in practice, convenient priors that allow for computationally cheap or tractable inference are commonly used. In this paper, we investigate the following question: for a given model, is it possible to compute an inference result with any convenient false prior, and afterwards, given any target prior of interest, quickly transform this result into the target posterior? A potential solution is to use importance sampling (IS). However, we demonstrate that IS will fail for many choices of the target prior, depending on its parametric form and similarity to the false prior. Instead, we propose prior swapping, a method that leverages the pre-inferred false posterior to efficiently generate accurate posterior samples under arbitrary target priors. Prior swapping lets us apply less-costly inference algorithms to certain models, and incorporate new or updated prior information "post-inference". We give theoretical guarantees about our method, and demonstrate it empirically on a number of models and priors.
Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks
Buccapatnam, Swapna, Liu, Fang, Eryilmaz, Atilla, Shroff, Ness B.
We study the stochastic multi-armed bandit (MAB) problem in the presence of side-observations across actions that occur as a result of an underlying network structure. In our model, a bipartite graph captures the relationship between actions and a common set of unknowns such that choosing an action reveals observations for the unknowns that it is connected to. This models a common scenario in online social networks where users respond to their friends' activity, thus providing side information about each other's preferences. Our contributions are as follows: 1) We derive an asymptotic lower bound (with respect to time) as a function of the bi-partite network structure on the regret of any uniformly good policy that achieves the maximum long-term average reward. 2) We propose two policies - a randomized policy; and a policy based on the well-known upper confidence bound (UCB) policies - both of which explore each action at a rate that is a function of its network position. We show, under mild assumptions, that these policies achieve the asymptotic lower bound on the regret up to a multiplicative factor, independent of the network structure. Finally, we use numerical examples on a real-world social network and a routing example network to demonstrate the benefits obtained by our policies over other existing policies.
Composing inference algorithms as program transformations
Zinkov, Robert, Shan, Chung-chieh
Probabilistic inference procedures are usually coded painstakingly from scratch, for each target model and each inference algorithm. We reduce this effort by generating inference procedures from models automatically. We make this code generation modular by decomposing inference algorithms into reusable program-to-program transformations. These transformations perform exact inference as well as generate probabilistic programs that compute expectations, densities, and MCMC samples. The resulting inference procedures are about as accurate and fast as other probabilistic programming systems on real-world problems.
Initialising Kernel Adaptive Filters via Probabilistic Inference
Castro, Ivรกn, Silva, Cristรณbal, Tobar, Felipe
Within kernel methods, kernel adaptive filters (KAFs) [1] are state-of-the-art nonlinear models for time series that build on the properties of reproducing kernel Hilbert spaces (RKHS) [2], in order to provide accurate predictions at a low computational cost. In the same way that support vectors play a fundamental role in support vector machines [3], KAFs rely on a subset of observed input samples referred to as centres, where new inputs are compared to these centres through a kernel function to compute the prediction. This procedure involves a number of parameters: those of the kernel, those related to the selection of the set of centres (dictionary), and those controlling the tradeoff between historical data and new observations. By adapting these model parameters, algorithms, such as kernel least mean square (KLMS) [4], [5] provide an efficient way to improve signal estimation over time as more data become available. Specifically, KLMS applies the least-mean-square rationale to the "kernelised" input (i.e., transformed by the kernel function), thus allowing for an efficient online implementation based on gradient steepest descent for updating the model parameters (i.e., the filter weights only). The main drawback of KAFs is the lack of a principled approach to tune filter weights, kernel parameters and the dictionary.
The Evolution of Machine Learning
In recent years, the term'machine learning' has become very popular among developers and business alike, even though research in the field has been going on for decades. Essentially, machine learning is about teaching machines to learn concepts and techniques the way humans do. Earlier, machines were only able to think in boolean logic โ having a stringent'yes' (1) or'no' (0) answer (output) to a question (input). This limited the type of questions one could ask a machine. Fuzzy logic systems were later introduced to address this particular issue by enabling machines to answer on a scale of values ranging from no to yes.
Sparse inference of the drift of a high-dimensional Ornstein-Uhlenbeck process
Gaรฏffas, Stรฉphane, Matulewicz, Gustaw
The Ornstein-Uhlenbeck, also called mean-reverting diffusion process, describes a process which evolves following a deterministic linear part with an added Gaussian noise, similarly to a vectorautoregressive process in discrete time. This model is ubiquitous in quantitative finance, for instance the one-dimensional version is used for modeling rates and is called the Vasicek model [Hul09]. In a multidimensional setting, it can be therefore used to describe systems with linear interactions perturbed by Gaussian noise, see Figure 1 below. Among many others, an example of application is inter-bank lending [CFS15, FI13], where lending is a flux of reserves and is proportional to the difference in reserves. A natural question is therefore how to estimate the interaction structure from the observation of the process. Unfortunately, the optimal solution based on the maximum likelihood estimator (MLE) is typically quite inaccurate in high-dimensional settings, because of the well-known curse of dimensionality, see for instance [BvdG11]. However, in real-world applications, the interaction structure is sparse: in the example mentioned above, banks have typically only a few lending partners [GG14, GSV15, BBvL15], as the lending arrangements are typically done on a personal level.
Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL
Corneli, Marco, Latouche, Pierre, Rossi, Fabrice
We develop a model in which interactions between nodes of a dynamic network are counted by non homogeneous Poisson processes. In a block modelling perspective, nodes belong to hidden clusters (whose number is unknown) and the intensity functions of the counting processes only depend on the clusters of nodes. In order to make inference tractable we move to discrete time by partitioning the entire time horizon in which interactions are observed in fixed-length time sub-intervals. First, we derive an exact integrated classification likelihood criterion and maximize it relying on a greedy search approach. This allows to estimate the memberships to clusters and the number of clusters simultaneously. Then a maximum-likelihood estimator is developed to estimate non parametrically the integrated intensities. We discuss the over-fitting problems of the model and propose a regularized version solving these issues. Experiments on real and simulated data are carried out in order to assess the proposed methodology.
Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks
Corneli, Marco, Latouche, Pierre, Rossi, Fabrice
The stochastic block model (SBM) is a flexible probabilistic tool that can be used to model interactions between clusters of nodes in a network. However, it does not account for interactions of time varying intensity between clusters. The extension of the SBM developed in this paper addresses this shortcoming through a temporal partition: assuming interactions between nodes are recorded on fixed-length time intervals, the inference procedure associated with the model we propose allows to cluster simultaneously the nodes of the network and the time intervals. The number of clusters of nodes and of time intervals, as well as the memberships to clusters, are obtained by maximizing an exact integrated complete-data likelihood, relying on a greedy search approach. Experiments on simulated and real data are carried out in order to assess the proposed methodology.
Submodular Variational Inference for Network Reconstruction
Chen, Lin, Crawford, Forrest W, Karbasi, Amin
In real-world and online social networks, individuals receive and transmit information in real time. Cascading information transmissions (e.g. phone calls, text messages, social media posts) may be understood as a realization of a diffusion process operating on the network, and its branching path can be represented by a directed tree. The process only traverses and thus reveals a limited portion of the edges. The network reconstruction/inference problem is to infer the unrevealed connections. Most existing approaches derive a likelihood and attempt to find the network topology maximizing the likelihood, a problem that is highly intractable. In this paper, we focus on the network reconstruction problem for a broad class of real-world diffusion processes, exemplified by a network diffusion scheme called respondent-driven sampling (RDS). We prove that under realistic and general models of network diffusion, the posterior distribution of an observed RDS realization is a Bayesian log-submodular model.We then propose VINE (Variational Inference for Network rEconstruction), a novel, accurate, and computationally efficient variational inference algorithm, for the network reconstruction problem under this model. Crucially, we do not assume any particular probabilistic model for the underlying network. VINE recovers any connected graph with high accuracy as shown by our experimental results on real-life networks.