Statistical Learning
Volume Regularization for Binary Classification
We introduce a large-volume box classification for binary prediction, which maintains a subset of weight vectors, and specifically axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains ``simple'' weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efficiently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of $30$ NLP datasets and binarized USPS optical character recognition datasets.
Random function priors for exchangeable arrays with applications to graphs and relational data
Lloyd, James, Orbanz, Peter, Ghahramani, Zoubin, Roy, Daniel M.
A fundamental problem in the analysis of structured relational data like graphs, networks, databases, and matrices is to extract a summary of the common structure underlyingrelations between individual entities. Relational data are typically encoded in the form of arrays; invariance to the ordering of rows and columns corresponds to exchangeable arrays. Results in probability theory due to Aldous, Hoover and Kallenberg show that exchangeable arrays can be represented in terms of a random measurable function which constitutes the natural model parameter in a Bayesian model. We obtain a flexible yet simple Bayesian nonparametric model by placing a Gaussian process prior on the parameter function. Efficient inference utilises elliptical slice sampling combined with a random sparse approximation to the Gaussian process. We demonstrate applications of the model to network data and clarify its relation to models in the literature, several of which emerge as special cases.
Graphical Gaussian Vector for Image Categorization
Harada, Tatsuya, Kuniyoshi, Yasuo
This paper proposes a novel image representation called a Graphical Gaussian Vector, which is a counterpart of the codebook and local feature matching approaches. In our method, we model the distribution of local features as a Gaussian Markov Random Field (GMRF) which can efficiently represent the spatial relationship among local features. We consider the parameter of GMRF as a feature vector of the image. Using concepts of information geometry, proper parameters and a metric from the GMRF can be obtained. Finally we define a new image feature by embedding the metric into the parameters, which can be directly applied to scalable linear classifiers. Our method obtains superior performance over the state-of-the-art methods in the standard object recognition datasets and comparable performance in the scene dataset. As the proposed method simply calculates the local auto-correlations of local features, it is able to achieve both high classification accuracy and high efficiency.
Active Learning of Multi-Index Function Models
We consider the problem of actively learning \textit{multi-index} functions of the form $f(\vecx) = g(\matA\vecx)= \sum_{i=1}^k g_i(\veca_i^T\vecx)$ from point evaluations of $f$. We assume that the function $f$ is defined on an $\ell_2$-ball in $\Real^d$, $g$ is twice continuously differentiable almost everywhere, and $\matA \in \mathbb{R}^{k \times d}$ is a rank $k$ matrix, where $k \ll d$. We propose a randomized, active sampling scheme for estimating such functions with uniform approximation guarantees. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive an estimator of the function $f$ along with sample complexity bounds. We also characterize the noise robustness of the scheme, and provide empirical evidence that the high-dimensional scaling of our sample complexity bounds are quite accurate.
Reducing statistical time-series problems to binary classification
We show how binary classification methods developed to work on i.i.d. data can be used for solving statistical problems that are seemingly unrelated to classification and concern highly-dependent time series. Specifically, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. The algorithms that we construct for solving these problems are based on a new metric between time-series distributions, which can be evaluated using binary classification methods. Universal consistency of the proposed algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data.
Phoneme Classification using Constrained Variational Gaussian Process Dynamical System
Park, Hyunsin, Yun, Sungrack, Park, Sanghyuk, Kim, Jongmin, Yoo, Chang D.
This paper describes a new acoustic model based on variational Gaussian process dynamical system (VGPDS) for phoneme classification. The proposed model overcomes the limitations of the classical HMM in modeling the real speech data, by adopting a nonlinear and nonparametric model. In our model, the GP prior on the dynamics function enables representing the complex dynamic structure of speech, while the GP prior on the emission function successfully models the global dependency over the observations. Additionally, we introduce variance constraint to the original VGPDS for mitigating sparse approximation error of the kernel matrix. The effectiveness of the proposed model is demonstrated with extensive experimental results including parameter estimation, classification performance on the synthetic and benchmark datasets.
Shifting Weights: Adapting Object Detectors from Image to Video
Tang, Kevin, Ramanathan, Vignesh, Fei-fei, Li, Koller, Daphne
Typical object detectors trained on images perform poorly on video, as there is a clear distinction in domain between the two types of data. In this paper, we tackle the problem of adapting object detectors learned from images to work well on videos. We treat the problem as one of unsupervised domain adaptation, in which we are given labeled data from the source domain (image), but only unlabeled data from the target domain (video). Our approach, self-paced domain adaptation, seeks to iteratively adapt the detector by retraining the detector with automatically discoveredtarget domain examples, starting with the easiest first. At each iteration, the algorithm adapts by considering an increased number of target domain examples,and a decreased number of source domain examples. To discover target domain examples from the vast amount of video data, we introduce a simple, robustapproach that scores trajectory tracks instead of bounding boxes. We also show how rich and expressive features specific to the target domain can be incorporated under the same framework. We show promising results on the 2011 TRECVID Multimedia Event Detection [1] and LabelMe Video [2] datasets that illustrate the benefit of our approach to adapt object detectors to video.
A Polynomial-time Form of Robust Regression
Yu, Yao-liang, Aslan, รzlem, Schuurmans, Dale
Despite the variety of robust regression methods that have been developed, current regression formulations are either NP-hard, or allow unbounded response to even a single leverage point. We present a general formulation for robust regression --Variational M-estimation--that unifies a number of robust regression methods while allowing a tractable approximation strategy. We develop an estimator that requires only polynomial-time, while achieving certain robustness and consistency guarantees. An experimental evaluation demonstrates the effectiveness of the new estimation approach compared to standard methods.
No-Regret Algorithms for Unconstrained Online Convex Optimization
Mcmahan, Brendan, Streeter, Matthew
Some of the most compelling applications of online convex optimization, including online prediction and classification, are unconstrained: the natural feasible set is R^n. Existing algorithms fail to achieve sub-linear regret in this setting unless constraints on the comparator point x* are known in advance. We present an algorithm that, without such prior knowledge, offers near-optimal regret bounds with respect to _any_ choice of x*. In particular, regret with respect to x* = 0 is _constant_. We then prove lower bounds showing that our algorithm's guarantees are optimal in this setting up to constant factors.
Dip-means: an incremental clustering method for estimating the number of clusters
Kalogeratos, Argyris, Likas, Aristidis
Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that may be used as a wrapper around any iterative clustering algorithm of the k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as a ''viewer'' and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of the distances between the viewer and the cluster members. Two important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.