Statistical Learning
Graph connection Laplacian and random matrices with random blocks
Karoui, Noureddine El, Wu, Hau-tieng
Graph connection Laplacian (GCL) is a modern data analysis technique that is starting to be applied for the analysis of high dimensional and massive datasets. Motivated by this technique, we study matrices that are akin to the ones appearing in the null case of GCL, i.e the case where there is no structure in the dataset under investigation. Developing this understanding is important in making sense of the output of the algorithms based on GCL. We hence develop a theory explaining the behavior of the spectral distribution of a large class of random matrices, in particular random matrices with random block entries of fixed size. Part of the theory covers the case where there is significant dependence between the blocks. Numerical work shows that the agreement between our theoretical predictions and numerical simulations is generally very good.
A framework for studying synaptic plasticity with neural spike train data
Linderman, Scott W., Stock, Christopher H., Adams, Ryan P.
Synaptic plasticity is believed to be the fundamental building block of learning and memory in the brain. Its study is of crucial importance to understanding the activity and function of neural circuits. With innovations in neural recording technology providing access to the simultaneous activity of increasingly large populations of neurons, statistical models are promising tools for formulating and testing hypotheses about the dynamics of synaptic connectivity. Advances in optical techniques (Packer et al., 2012; Hochbaum et al., 2014), for example, have made it possible to simultaneously record from and stimulate large populations of synaptically connected neurons. Armed with statistical tools capable of inferring time-varying synaptic connectivity, neuroscientists could test competing models of synaptic plasticity, discover new learning rules at the monosynaptic and network level, investigate the effects of disease on synaptic plasticity, and potentially design stimuli to modify neural networks. Despite the popularity of GLMs for spike data, relatively little work has attempted to model the time-varying nature of neural interactions. Here we model interaction weights as a dynamical system governed by parametric synaptic plasticity rules. To perform inference in this model, we use particle Markov Chain Monte Carlo (pMCMC) (Andrieu et al., 2010), a recently developed inference technique for complex time series. We use this new modeling framework to examine the problem of using recorded data to distinguish between proposed variants of spike-timing-dependent plasticity (STDP) learning rules.
Learning Multi-Relational Semantics Using Neural-Embedding Models
Yang, Bishan, Yih, Wen-tau, He, Xiaodong, Gao, Jianfeng, Deng, Li
In this paper we present a unified framework for modeling multi-relational representations, scoring, and learning, and conduct an empirical study of several recent multi-relational embedding models under the framework. We investigate the different choices of relation operators based on linear and bilinear transformations, and also the effects of entity representations by incorporating unsupervised vectors pre-trained on extra textual resources. Our results show several interesting findings, enabling the design of a simple embedding model that achieves the new state-of-the-art performance on a popular knowledge base completion task evaluated on Freebase.
A unified view of generative models for networks: models, methods, opportunities, and challenges
Jacobs, Abigail Z., Clauset, Aaron
These efforts have produced a diverse ecology of models and methods. Despite this diversity, many of these models share a common underlying structure: pairwise interactions (edges) are generated with probability conditional on latent vertex attributes. Differences between models generally stem from different philosophical choices about how to learn from data or different empirically-motivated goals. The highly interdisciplinary nature of work on these generative models, however, has inhibited the development of a unified view of their similarities and differences. For instance, novel theoretical models and optimization techniques developed in machine learning are largely unknown within the social and biological sciences, which have instead emphasized model interpretability. Here, we describe a unified view of generative models for networks that draws together many of these disparate threads and highlights the fundamental similarities and differences that span these fields. We then describe a number of opportunities and challenges for future work that are revealed by this view.
Dynamic Programming for Instance Annotation in Multi-instance Multi-label Learning
Pham, Anh T., Raich, Raviv, Fern, Xiaoli Z.
Labeling data for classification requires significant human effort. To reduce labeling cost, instead of labeling every instance, a group of instances (bag) is labeled by a single bag label. Computer algorithms are then used to infer the label for each instance in a bag, a process referred to as instance annotation. This task is challenging due to the ambiguity regarding the instance labels. We propose a discriminative probabilistic model for the instance annotation problem and introduce an expectation maximization framework for inference, based on the maximum likelihood approach. For many probabilistic approaches, brute-force computation of the instance label posterior probability given its bag label is exponential in the number of instances in the bag. Our key contribution is a dynamic programming method for computing the posterior that is linear in the number of instances. We evaluate our methods using both benchmark and real world data sets, in the domain of bird song, image annotation, and activity recognition. In many cases, the proposed framework outperforms, sometimes significantly, the current state-of-the-art MIML learning methods, both in instance label prediction and bag label prediction.
Statistical Models for Degree Distributions of Networks
Sadeghi, Kayvan, Rinaldo, Alessandro
We define and study the statistical models in exponential family form whose sufficient statistics are the degree distributions and the bi-degree distributions of undirected labelled simple graphs. Graphs that are constrained by the joint degree distributions are called $dK$-graphs in the computer science literature and this paper attempts to provide the first statistically grounded analysis of this type of models. In addition to formalizing these models, we provide some preliminary results for the parameter estimation and the asymptotic behaviour of the model for degree distribution, and discuss the parameter estimation for the model for bi-degree distribution.
Stochastic Compositional Gradient Descent: Algorithms for Minimizing Compositions of Expected-Value Functions
Wang, Mengdi, Fang, Ethan X., Liu, Han
Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., problems of the form $\min_x \mathbf{E}_v [f_v\big(\mathbf{E}_w [g_w(x)]\big)]$. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of $f_v,g_{w}$ and use an auxiliary variable to track the unknown quantity $\mathbf{E}_w[g_w(x)]$. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieve a convergence rate of $O(k^{-1/4})$ in the general case and $O(k^{-2/3})$ in the strongly convex case, after taking $k$ samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of $O(k^{-2/7})$ in the general case and $O(k^{-4/5})$ in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.
A convex formulation for hyperspectral image superresolution via subspace-based regularization
Simรตes, Miguel, Bioucas-Dias, Josรฉ, Almeida, Luis B., Chanussot, Jocelyn
Hyperspectral remote sensing images (HSIs) usually have high spectral resolution and low spatial resolution. Conversely, multispectral images (MSIs) usually have low spectral and high spatial resolutions. The problem of inferring images which combine the high spectral and high spatial resolutions of HSIs and MSIs, respectively, is a data fusion problem that has been the focus of recent active research due to the increasing availability of HSIs and MSIs retrieved from the same geographical area. We formulate this problem as the minimization of a convex objective function containing two quadratic data-fitting terms and an edge-preserving regularizer. The data-fitting terms account for blur, different resolutions, and additive noise. The regularizer, a form of vector Total Variation, promotes piecewise-smooth solutions with discontinuities aligned across the hyperspectral bands. The downsampling operator accounting for the different spatial resolutions, the non-quadratic and non-smooth nature of the regularizer, and the very large size of the HSI to be estimated lead to a hard optimization problem. We deal with these difficulties by exploiting the fact that HSIs generally "live" in a low-dimensional subspace and by tailoring the Split Augmented Lagrangian Shrinkage Algorithm (SALSA), which is an instance of the Alternating Direction Method of Multipliers (ADMM), to this optimization problem, by means of a convenient variable splitting. The spatial blur and the spectral linear operators linked, respectively, with the HSI and MSI acquisition processes are also estimated, and we obtain an effective algorithm that outperforms the state-of-the-art, as illustrated in a series of experiments with simulated and real-life data.
Detecting change points in the large-scale structure of evolving networks
Interactions among people or objects are often dynamic in nature and can be represented as a sequence of networks, each providing a snapshot of the interactions over a brief period of time. An important task in analyzing such evolving networks is change-point detection, in which we both identify the times at which the large-scale pattern of interactions changes fundamentally and quantify how large and what kind of change occurred. Here, we formalize for the first time the network change-point detection problem within an online probabilistic learning framework and introduce a method that can reliably solve it. This method combines a generalized hierarchical random graph model with a Bayesian hypothesis test to quantitatively determine if, when, and precisely how a change point has occurred. We analyze the detectability of our method using synthetic data with known change points of different types and magnitudes, and show that this method is more accurate than several previously used alternatives. Applied to two high-resolution evolving social networks, this method identifies a sequence of change points that align with known external "shocks" to these networks.
Joint modeling of multiple time series via the beta process with application to motion capture segmentation
Fox, Emily B., Hughes, Michael C., Sudderth, Erik B., Jordan, Michael I.
We propose a Bayesian nonparametric approach to the problem of jointly modeling multiple related time series. Our model discovers a latent set of dynamical behaviors shared among the sequences, and segments each time series into regions defined by a subset of these behaviors. Using a beta process prior, the size of the behavior set and the sharing pattern are both inferred from data. We develop Markov chain Monte Carlo (MCMC) methods based on the Indian buffet process representation of the predictive distribution of the beta process. Our MCMC inference algorithm efficiently adds and removes behaviors via novel split-merge moves as well as data-driven birth and death proposals, avoiding the need to consider a truncated model. We demonstrate promising results on unsupervised segmentation of human motion capture data.