Statistical Learning
Layered Dynamic Textures
Chan, Antoni B., Vasconcelos, Nuno
A dynamic texture is a video model that treats a video as a sample from a spatiotemporal stochastic process, specifically a linear dynamical system. Oneproblem associated with the dynamic texture is that it cannot model video where there are multiple regions of distinct motion. In this work, we introduce the layered dynamic texture model, which addresses this problem. We also introduce a variant of the model, and present the EM algorithm for learning each of the models. Finally, we demonstrate the efficacy of the proposed model for the tasks of segmentation and synthesis ofvideo.
Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
Blanchard, Gilles, Sugiyama, Masashi, Kawanabe, Motoaki, Spokoiny, Vladimir, Mรผller, Klaus-Robert
We propose a new linear method for dimension reduction to identify non-Gaussian components in high dimensional data. Our method, NGCA (non-Gaussian component analysis), uses a very general semi-parametric framework. In contrast to existing projection methods we define what is uninteresting (Gaussian): by projecting out uninterestingness, we can estimate therelevant non-Gaussian subspace. We show that the estimation error of finding the non-Gaussian components tends to zero at a parametric rate.Once NGCA components are identified and extracted, various tasks can be applied in the data analysis process, like data visualization, clustering, denoising or classification. A numerical study demonstrates the usefulness of our method.
Convex Neural Networks
Bengio, Yoshua, Roux, Nicolas L., Vincent, Pascal, Delalleau, Olivier, Marcotte, Patrice
Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage ofmany learning algorithms, such as multi-layer artificial neural networks. We show that training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem. This problem involves an infinite number of variables, but can be solved by incrementally inserting a hidden unit at a time, each time finding a linear classifier that minimizes a weighted sum of errors.
Non-Local Manifold Parzen Windows
Bengio, Yoshua, Larochelle, Hugo, Vincent, Pascal
To escape from the curse of dimensionality, we claim that one can learn non-local functions, in the sense that the value and shape of the learned function at x must be inferred using examples that may be far from x . With this objective, we present a non-local nonparametric density estimator. It builds upon previously proposed Gaussian mixture models with regularized covariance matrices to take into account the local shape of the manifold. It also builds upon recent work on non-local estimators of the tangent plane of a manifold, which are able to generalize in places with little training data, unlike traditional, local, nonparametric models.
The Curse of Highly Variable Functions for Local Kernel Machines
Bengio, Yoshua, Delalleau, Olivier, Roux, Nicolas L.
We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms that rely solely on the smoothness prior-with similarity between examples expressed with a local kernel - are sensitive to the curse of dimensionality, or more precisely to the variability of the target. Our discussion covers supervised, semisupervised andunsupervised learning algorithms. These algorithms are found to be local in the sense that crucial properties of the learned function atx depend mostly on the neighbors of x in the training set. This makes them sensitive to the curse of dimensionality, well studied for classical nonparametric statistical learning. We show in the case of the Gaussian kernel that when the function to be learned has many variations, these algorithms require a number of training examples proportional to the number of variations, which could be large even though there may exist shortdescriptions of the target function, i.e. their Kolmogorov complexity maybe low. This suggests that there exist non-local learning algorithms that at least have the potential to learn about such structured but apparently complex functions (because locally they have many variations), whilenot using very specific prior domain knowledge.
Learning Topology with the Generative Gaussian Graph and the EM Algorithm
Given a set of points and a set of prototypes representing them, how to create a graph of the prototypes whose topology accounts for that of the points? This problem had not yet been explored in the framework of statistical learningtheory. In this work, we propose a generative model based on the Delaunay graph of the prototypes and the Expectation-Maximization algorithm to learn the parameters. This work is a first step towards the construction of a topological model of a set of points grounded on statistics.
Maximum Margin Semi-Supervised Learning for Structured Variables
Altun, Y., McAllester, D., Belkin, M.
Many real-world classification problems involve the prediction of multiple interdependent variables forming some structural dependency. Recentprogress in machine learning has mainly focused on supervised classification of such structured variables. In this paper, we investigate structured classification in a semi-supervised setting. We present a discriminative approach that utilizes the intrinsic geometry ofinput patterns revealed by unlabeled data points and we derive a maximum-margin formulation of semi-supervised learning for structured variables.
Large-scale biophysical parameter estimation in single neurons via constrained linear regression
Ahrens, Misha, Paninski, Liam, Huys, Quentin J.
Our understanding of the input-output function of single cells has been substantially advanced by biophysically accurate multi-compartmental models. The large number of parameters needing hand tuning in these models has, however, somewhat hampered their applicability and interpretability. Herewe propose a simple and well-founded method for automatic estimation of many of these key parameters: 1) the spatial distribution of channel densities on the cell's membrane; 2) the spatiotemporal pattern of synaptic input; 3) the channels' reversal potentials; 4) the intercompartmental conductances;and 5) the noise level in each compartment. We assume experimental access to: a) the spatiotemporal voltage signal in the dendrite (or some contiguous subpart thereof, e.g.