Goto

Collaborating Authors

 Genre


Metrics for Multivariate Dictionaries

arXiv.org Machine Learning

Overcomplete representations and dictionary learning algorithms kept attracting a growing interest in the machine learning community. This paper addresses the emerging problem of comparing multivariate overcomplete representations. Despite a recurrent need to rely on a distance for learning or assessing multivariate overcomplete representations, no metrics in their underlying spaces have yet been proposed. Henceforth we propose to study overcomplete representations from the perspective of frame theory and matrix manifolds. We consider distances between multivariate dictionaries as distances between their spans which reveal to be elements of a Grassmannian manifold. We introduce Wasserstein-like set-metrics defined on Grassmannian spaces and study their properties both theoretically and numerically. Indeed a deep experimental study based on tailored synthetic datasetsand real EEG signals for Brain-Computer Interfaces (BCI) have been conducted. In particular, the introduced metrics have been embedded in clustering algorithm and applied to BCI Competition IV-2a for dataset quality assessment. Besides, a principled connection is made between three close but still disjoint research fields, namely, Grassmannian packing, dictionary learning and compressed sensing.


An Introductory Study on Time Series Modeling and Forecasting

arXiv.org Machine Learning

Time series modeling and forecasting has fundamental importance to various practical domains. Thus a lot of active research works is going on in this subject during several years. Many important models have been proposed in literature for improving the accuracy and effectiveness of time series forecasting. The aim of this dissertation work is to present a concise description of some popular time series forecasting models used in practice, with their salient features. In this thesis, we have described three important classes of time series models, viz. the stochastic, neural networks and SVM based models, together with their inherent forecasting strengths and weaknesses. We have also discussed about the basic issues related to time series modeling, such as stationarity, parsimony, overfitting, etc. Our discussion about different time series models is supported by giving the experimental forecast results, performed on six real time series datasets. While fitting a model to a dataset, special care is taken to select the most parsimonious one. To evaluate forecast accuracy as well as to compare among different models fitted to a time series, we have used the five performance measures, viz. MSE, MAD, RMSE, MAPE and Theil's U-statistics. For each of the six datasets, we have shown the obtained forecast diagram which graphically depicts the closeness between the original and forecasted observations. To have authenticity as well as clarity in our discussion about time series modeling and forecasting, we have taken the help of various published research works from reputed journals and some standard books.


A Conformal Prediction Approach to Explore Functional Data

arXiv.org Machine Learning

This paper applies conformal prediction techniques to compute simultaneous prediction bands and clustering trees for functional data. These tools can be used to detect outliers and clusters. Both our prediction bands and clustering trees provide prediction sets for the underlying stochastic process with a guaranteed finite sample behavior, under no distributional assumptions. The prediction sets are also informative in that they correspond to the high density region of the underlying process. While ordinary conformal prediction has high computational cost for functional data, we use the inductive conformal predictor, together with several novel choices of conformity scores, to simplify the computation. Our methods are illustrated on some real data examples.


Convex vs nonconvex approaches for sparse estimation: GLasso, Multiple Kernel Learning and Hyperparameter GLasso

arXiv.org Machine Learning

The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different non-convex estimator where hyperparameters are optimized. Extending these arguments to problems where groups of variables have to be estimated, we study a computational scheme for sparse estimation that differs from the Group Lasso. Although the underlying optimization problem defining this estimator is non-convex, an initialization strategy based on a univariate Bayesian forward selection scheme is presented. This also allows us to define an effective non-convex estimator where only one scalar variable is involved in the optimization process. Theoretical arguments, independent of the correctness of the priors entering the sparse model, are included to clarify the advantages of this non-convex technique in comparison with other convex estimators. Numerical experiments are also used to compare the performance of these approaches.


Phoneme discrimination using $KS$-algebra II

arXiv.org Machine Learning

$KS$-algebra consists of expressions constructed with four kinds operations, the minimum, maximum, difference and additively homogeneous generalized means. Five families of $Z$-classifiers are investigated on binary classification tasks between English phonemes. It is shown that the classifiers are able to reflect well known formant characteristics of vowels, while having very small Kolmogoroff's complexity.


On learning parametric-output HMMs

arXiv.org Machine Learning

Hidden Markov Models (HMM) are a standard tool in the modeling and analysis of time series with a wide variety of applications. When the number of hidden states is known, the standard method for estimating the HMM parameters from given observed data is the Baum-Welch algorithm [Baum et al., 1970]. The latter is known to suffer from two serious drawbacks: it 1 tends to converge (i) very slowly and (ii) only to a local maximum. Indeed, the problem of recovering the parameters of a general HMM is provably hard, in several distinct senses [Abe and Warmuth, 1992, Lyngsø and Pedersen, 2001, Terwijn, 2002]. In this paper we consider learning parametric-output HMMs with a finite and known number of hidden states, where the output from each hidden state follows a parametric distribution from a given family.


Learning a Factor Model via Regularized PCA

arXiv.org Machine Learning

Linear factor models have been widely used for a long time and with notable success in economics, finance, medicine, psychology, and various other natural and social sciences (Harman, 1976). In such a model, each observed variable is a linear combination of unobserved common factors plus idiosyncratic noise, and the collection of random variables is jointly Gaussian. We consider in this paper the problem of learning a factor model from a training set of vector observations. In particular, our learning problem entails simultaneously estimating the loadings of each factor and the residual variance of each variable. We seek an estimate of these parameters that best explains out-of-sample data.


Accelerated Linear SVM Training with Adaptive Variable Selection Frequencies

arXiv.org Machine Learning

Support vector machine (SVM) training is an active research area since the dawn of the method. In recent years there has been increasing interest in specialized solvers for the important case of linear models. The algorithm presented by Hsieh et al., probably best known under the name of the "liblinear" implementation, marks a major breakthrough. The method is analog to established dual decomposition algorithms for training of non-linear SVMs, but with greatly reduced computational complexity per update step. This comes at the cost of not keeping track of the gradient of the objective any more, which excludes the application of highly developed working set selection algorithms. We present an algorithmic improvement to this method. We replace uniform working set selection with an online adaptation of selection frequencies. The adaptation criterion is inspired by modern second order working set selection methods. The same mechanism replaces the shrinking heuristic. This novel technique speeds up training in some cases by more than an order of magnitude.


Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint

arXiv.org Machine Learning

Deep Belief Networks (DBN) have been successfully applied on popular machine learning tasks. Specifically, when applied on hand-written digit recognition, DBNs have achieved approximate accuracy rates of 98.8%. In an effort to optimize the data representation achieved by the DBN and maximize their descriptive power, recent advances have focused on inducing sparse constraints at each layer of the DBN. In this paper we present a theoretical approach for sparse constraints in the DBN using the mixed norm for both non-overlapping and overlapping groups. We explore how these constraints affect the classification accuracy for digit recognition in three different datasets (MNIST, USPS, RIMES) and provide initial estimations of their usefulness by altering different parameters such as the group size and overlap percentage.


Learning Theory Approach to Minimum Error Entropy Criterion

arXiv.org Machine Learning

We consider the minimum error entropy (MEE) criterion and an empirical risk minimization learning algorithm in a regression setting. A learning theory approach is presented for this MEE algorithm and explicit error bounds are provided in terms of the approximation ability and capacity of the involved hypothesis space when the MEE scaling parameter is large. Novel asymptotic analysis is conducted for the generalization error associated with Renyi's entropy and a Parzen window function, to overcome technical difficulties arisen from the essential differences between the classical least squares problems and the MEE setting. A semi-norm and the involved symmetrized least squares error are introduced, which is related to some ranking algorithms.