Goto

Collaborating Authors

 Statistical Learning


Representation of Quasi-Monotone Functionals by Families of Separating Hyperplanes

arXiv.org Machine Learning

We characterize when the level sets of a continuous quasi-monotone functional defined on a suitable convex subset of a normed space can be uniquely represented by a family of bounded continuous functionals. Furthermore, we investigate how regularly these functionals depend on the parameterizing level. Finally, we show how this question relates to the recent problem of property elicitation that simultaneously attracted interest in machine learning, statistical evaluation of forecasts, and finance.


Adaptive Online Learning

arXiv.org Machine Learning

We propose a general framework for studying adaptive regret bounds in the online learning framework, including model selection bounds and data-dependent bounds. Given a data- or model-dependent bound we ask, "Does there exist some algorithm achieving this bound?" We show that modifications to recently introduced sequential complexity measures can be used to answer this question by providing sufficient conditions under which adaptive rates can be achieved. In particular each adaptive rate induces a set of so-called offset complexity measures, and obtaining small upper bounds on these quantities is sufficient to demonstrate achievability. A cornerstone of our analysis technique is the use of one-sided tail inequalities to bound suprema of offset random processes. Our framework recovers and improves a wide variety of adaptive bounds including quantile bounds, second-order data-dependent bounds, and small loss bounds. In addition we derive a new type of adaptive bound for online linear optimization based on the spectral norm, as well as a new online PAC-Bayes theorem that holds for countably infinite sets.


AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization

arXiv.org Machine Learning

We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients. We discuss, analyze, and experiment with a setup motivated by the behavior of real-world distributed computation networks, where the machines are differently slow at different time. Therefore, we allow the parameter updates to be sensitive to the actual delays experienced, rather than to worst-case bounds on the maximum delay. This sensitivity leads to larger stepsizes, that can help gain rapid initial convergence without having to wait too long for slower machines, while maintaining the same asymptotic complexity. We obtain encouraging improvements to overall convergence for distributed experiments on real datasets with up to billions of examples and features.


The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams

arXiv.org Machine Learning

Stream mining poses unique challenges to machine learning: predictive models are required to be scalable, incrementally trainable, must remain bounded in size (even when the data stream is arbitrarily long), and be nonparametric in order to achieve high accuracy even in complex and dynamic environments. Moreover, the learning system must be parameterless ---traditional tuning methods are problematic in streaming settings--- and avoid requiring prior knowledge of the number of distinct class labels occurring in the stream. In this paper, we introduce a new algorithmic approach for nonparametric learning in data streams. Our approach addresses all above mentioned challenges by learning a model that covers the input space using simple local classifiers. The distribution of these classifiers dynamically adapts to the local (unknown) complexity of the classification problem, thus achieving a good balance between model complexity and predictive accuracy. We design four variants of our approach of increasing adaptivity. By means of an extensive empirical evaluation against standard nonparametric baselines, we show state-of-the-art results in terms of accuracy versus model size. For the variant that imposes a strict bound on the model size, we show better performance against all other methods measured at the same model size value. Our empirical analysis is complemented by a theoretical performance guarantee which does not rely on any stochastic assumption on the source generating the stream.


Review and Perspective for Distance Based Trajectory Clustering

arXiv.org Machine Learning

TRAJECTORY is a set of positional information for a moving object, ordered by time. This kind of multidimensional data is prevalent in many fields and applications, for example, to understand migration patterns by studying trajectories of animals, predict meteorology with hurricane data, improve athletes performance, etc. Our study is concentrated on vehicle trajectories within a road network. The growing use of GPS receivers and WIFI embedded mobile devices equipped with hardware for storing data enables us to collect a very large amount of data, that has to be analyzed in order to extract any relevant information. The complexity of the extracted data makes it a difficult challenge.


Introduction to Cross-Entropy Clustering The R Package CEC

arXiv.org Machine Learning

The R Package CEC performs clustering based on the cross-entropy clustering (CEC) method, which was recently developed with the use of information theory. The main advantage of CEC is that it combines the speed and simplicity of $k$-means with the ability to use various Gaussian mixture models and reduce unnecessary clusters. In this work we present a practical tutorial to CEC based on the R Package CEC. Functions are provided to encompass the whole process of clustering.


Time Series Clustering via Community Detection in Networks

arXiv.org Machine Learning

In this paper, we propose a technique for time series clustering using community detection in complex networks. Firstly, we present a method to transform a set of time series into a network using different distance functions, where each time series is represented by a vertex and the most similar ones are connected. Then, we apply community detection algorithms to identify groups of strongly connected vertices (called a community) and, consequently, identify time series clusters. Still in this paper, we make a comprehensive analysis on the influence of various combinations of time series distance functions, network generation methods and community detection techniques on clustering results. Experimental study shows that the proposed network-based approach achieves better results than various classic or up-to-date clustering techniques under consideration. Statistical tests confirm that the proposed method outperforms some classic clustering algorithms, such as $k$-medoids, diana, median-linkage and centroid-linkage in various data sets. Interestingly, the proposed method can effectively detect shape patterns presented in time series due to the topological structure of the underlying network constructed in the clustering process. At the same time, other techniques fail to identify such patterns. Moreover, the proposed method is robust enough to group time series presenting similar pattern but with time shifts and/or amplitude variations. In summary, the main point of the proposed method is the transformation of time series from time-space domain to topological domain. Therefore, we hope that our approach contributes not only for time series clustering, but also for general time series analysis tasks.


A Dictionary Learning Approach for Factorial Gaussian Models

arXiv.org Machine Learning

In this paper, we develop a parameter estimation method for factorially parametrized models such as Factorial Gaussian Mixture Model and Factorial Hidden Markov Model. Our contributions are two-fold. First, we show that the emission matrix of the standard Factorial Model is unidentifiable even if the true assignment matrix is known. Secondly, we address the issue of identifiability by making a one component sharing assumption and derive a parameter learning algorithm for this case. Our approach is based on a dictionary learning problem of the form $X = O R$, where the goal is to learn the dictionary $O$ given the data matrix $X$. We argue that due to the specific structure of the activation matrix $R$ in the shared component factorial mixture model, and an incoherence assumption on the shared component, it is possible to extract the columns of the $O$ matrix without the need for alternating between the estimation of $O$ and $R$.


Robust Subspace Clustering via Smoothed Rank Approximation

arXiv.org Machine Learning

Matrix rank minimizing subject to affine constraints arises in many application areas, ranging from signal processing to machine learning. Nuclear norm is a convex relaxation for this problem which can recover the rank exactly under some restricted and theoretically interesting conditions. However, for many real-world applications, nuclear norm approximation to the rank function can only produce a result far from the optimum. To seek a solution of higher accuracy than the nuclear norm, in this paper, we propose a rank approximation based on Logarithm-Determinant. We consider using this rank approximation for subspace clustering application. Our framework can model different kinds of errors and noise. Effective optimization strategy is developed with theoretical guarantee to converge to a stationary point. The proposed method gives promising results on face clustering and motion segmentation tasks compared to the state-of-the-art subspace clustering algorithms.


Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo

arXiv.org Machine Learning

We present a novel approach for fully non-stationary Gaussian process regression (GPR), where all three key parameters -- noise variance, signal variance and lengthscale -- can be simultaneously input-dependent. We develop gradient-based inference methods to learn the unknown function and the non-stationary model parameters, without requiring any model approximations. We propose to infer full parameter posterior with Hamiltonian Monte Carlo (HMC), which conveniently extends the analytical gradient-based GPR learning by guiding the sampling with model gradients. We also learn the MAP solution from the posterior by gradient ascent. In experiments on several synthetic datasets and in modelling of temporal gene expression, the nonstationary GPR is shown to be necessary for modeling realistic input-dependent dynamics, while it performs comparably to conventional stationary or previous non-stationary GPR models otherwise.