Regression
Joint segmentation of multivariate time series with hidden process regression for human activity recognition
Chamroukhi, Faicel, Mohammed, Samer, Trabelsi, Dorra, Oukhellou, Latifa, Amirat, Yacine
The problem of human activity recognition is central for understanding and predicting the human behavior, in particular in a prospective of assistive services to humans, such as health monitoring, well being, security, etc. There is therefore a growing need to build accurate models which can take into account the variability of the human activities over time (dynamic models) rather than static ones which can have some limitations in such a dynamic context. In this paper, the problem of activity recognition is analyzed through the segmentation of the multidimensional time series of the acceleration data measured in the 3-d space using body-worn accelerometers. The proposed model for automatic temporal segmentation is a specific statistical latent process model which assumes that the observed acceleration sequence is governed by sequence of hidden (unobserved) activities. More specifically, the proposed approach is based on a specific multiple regression model incorporating a hidden discrete logistic process which governs the switching from one activity to another over time. The model is learned in an unsupervised context by maximizing the observed-data log-likelihood via a dedicated expectation-maximization (EM) algorithm. We applied it on a real-world automatic human activity recognition problem and its performance was assessed by performing comparisons with alternative approaches, including well-known supervised static classifiers and the standard hidden Markov model (HMM). The obtained results are very encouraging and show that the proposed approach is quite competitive even it works in an entirely unsupervised way and does not requires a feature extraction preprocessing step.
High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables
Deleforge, Antoine, Forbes, Florence, Horaud, Radu
In this work we address the problem of approximating high-dimensional data with a low-dimensional representation. We make the following contributions. We propose an inverse regression method which exchanges the roles of input and response, such that the low-dimensional variable becomes the regressor, and which is tractable. We introduce a mixture of locally-linear probabilistic mapping model that starts with estimating the parameters of inverse regression, and follows with inferring closed-form solutions for the forward parameters of the high-dimensional regression problem of interest. Moreover, we introduce a partially-latent paradigm, such that the vector-valued response variable is composed of both observed and latent entries, thus being able to deal with data contaminated by experimental artifacts that cannot be explained with noise models. The proposed probabilistic formulation could be viewed as a latent-variable augmentation of regression. We devise expectation-maximization (EM) procedures based on a data augmentation strategy which facilitates the maximum-likelihood search over the model parameters. We propose two augmentation schemes and we describe in detail the associated EM inference procedures that may well be viewed as generalizations of a number of EM regression, dimension reduction, and factor analysis algorithms. The proposed framework is validated with both synthetic and real data. We provide experimental evidence that our method outperforms several existing regression techniques.
Functional Bipartite Ranking: a Wavelet-Based Filtering Approach
Clรฉmenรงon, Stรฉphan, Depecker, Marine
Functional Classification, i.e. the binary classification problem when the input observation X (X(t)) is of the form of a (possibly sampled) random curve/function and the output variable Y { 1, 1} is a binary label, has been the subject of a good deal of attention in the machine-learning literature in the past few years, see [1] or [2]. In contrast, Bipartite Ranking, termed Nonparametric Scoring sometimes, has never been tackled in a functional framework, except from the restrictive angle of Functional Logistic Regression, see [3] or [4] for instance. This global learning task consists in ordering all possible input observations X so that positive ones appear on top of the list with highest probability. This predictive problem, which can be cast in terms of ROC curve optimization (see [5]), covers a wide variety of applications, ranging from anomaly detection in signal processing to automatic design of diagnosis tools in medicine through creditscoring in mathematical finance or the conception of search engines in information retrieval. Functional versions of many popular approaches for classification have been developed, relying in general on a preliminary finite dimensional representation/projection of the input data.
Consistent selection of tuning parameters via variable selection stability
Sun, Wei, Wang, Junhui, Fang, Yixin
Penalized regression models are popularly used in high-dimensional data analysis to conduct variable selection and model fitting simultaneously. Whereas success has been widely reported in literature, their performances largely depend on the tuning parameters that balance the trade-off between model fitting and model sparsity. Existing tuning criteria mainly follow the route of minimizing the estimated prediction error or maximizing the posterior model probability, such as cross-validation, AIC and BIC. This article introduces a general tuning parameter selection criterion based on a novel concept of variable selection stability. The key idea is to select the tuning parameters so that the resultant penalized regression model is stable in variable selection. The asymptotic selection consistency is established for both fixed and diverging dimensions. The effectiveness of the proposed criterion is also demonstrated in a variety of simulated examples as well as an application to the prostate cancer data.
Oracle Inequalities for Convex Loss Functions with Non-Linear Targets
Caner, Mehmet, Kock, Anders Bredahl
This paper consider penalized empirical loss minimization of convex loss functions with unknown non-linear target functions. Using the elastic net penalty we establish a finite sample oracle inequality which bounds the loss of our estimator from above with high probability. If the unknown target is linear this inequality also provides an upper bound of the estimation error of the estimated parameter vector. These are new results and they generalize the econometrics and statistics literature. Next, we use the non-asymptotic results to show that the excess loss of our estimator is asymptotically of the same order as that of the oracle. If the target is linear we give sufficient conditions for consistency of the estimated parameter vector. Next, we briefly discuss how a thresholded version of our estimator can be used to perform consistent variable selection. We give two examples of loss functions covered by our framework and show how penalized nonparametric series estimation is contained as a special case and provide a finite sample upper bound on the mean square error of the elastic net series estimator.
An Algorithmic Theory of Dependent Regularizers, Part 1: Submodular Structure
We present an exploration of the rich theoretical connections between several classes of regularized models, network flows, and recent results in submodular function theory. This work unifies key aspects of these problems under a common theory, leading to novel methods for working with several important models of interest in statistics, machine learning and computer vision. In Part 1, we review the concepts of network flows and submodular function optimization theory foundational to our results. We then examine the connections between network flows and the minimum-norm algorithm from submodular optimization, extending and improving several current results. This leads to a concise representation of the structure of a large class of pairwise regularized models important in machine learning, statistics and computer vision. In Part 2, we describe the full regularization path of a class of penalized regression problems with dependent variables that includes the graph-guided LASSO and total variation constrained models. This description also motivates a practical algorithm. This allows us to efficiently find the regularization path of the discretized version of TV penalized models. Ultimately, our new algorithms scale up to high-dimensional problems with millions of variables.
A Component Lasso
Hussami, Nadine, Tibshirani, Robert
We propose a new sparse regression method called the component lasso, based on a simple idea. The method uses the connected-components structure of the sample covariance matrix to split the problem into smaller ones. It then solves the subproblems separately, obtaining a coefficient vector for each one. Then, it uses non-negative least squares to recombine the different vectors into a single solution. This step is useful in selecting and reweighting components that are correlated with the response. Simulated and real data examples show that the component lasso can outperform standard regression methods such as the lasso and elastic net, achieving a lower mean squared error as well as better support recovery.
Families of Parsimonious Finite Mixtures of Regression Models
Dang, Utkarsh J., McNicholas, Paul D.
Model-based clustering has become increasingly popular during the last decade. Parametric mixture models are used in model-based clustering; however, such models generally do not exploit covariates. Incorporating a regression structure can yield important insight when there is a regression relationship between some variables. Methodologies that deal with such data include finite mixtures of regressions (FMR; [7, 13]) and finite mixtures of regressions with concomitant variables (FMRC; [22]), supported by the popular flexmix package [13]. Multivariate correlated responses can be naturally integrated into such models. However, flexmix currently does not account for correlated response variables for both FMR and FMRC. FMR models that deal with correlated response variables have recently been proposed [19, 9].
On Approximate Inference for Generalized Gaussian Process Models
Shang, Lifeng, Chan, Antoni B.
A generalized Gaussian process model (GGPM) is a unifying framework that encompasses many existing Gaussian process (GP) models, such as GP regression, classification, and counting. In the GGPM framework, the observation likelihood of the GP model is itself parameterized using the exponential family distribution (EFD). In this paper, we consider efficient algorithms for approximate inference on GGPMs using the general form of the EFD. A particular GP model and its associated inference algorithms can then be formed by changing the parameters of the EFD, thus greatly simplifying its creation for task-specific output domains. We demonstrate the efficacy of this framework by creating several new GP models for regressing to non-negative reals and to real intervals. We also consider a closed-form Taylor approximation for efficient inference on GGPMs, and elaborate on its connections with other model-specific heuristic closed-form approximations. Finally, we present a comprehensive set of experiments to compare approximate inference algorithms on a wide variety of GGPMs.
Local and global asymptotic inference in smoothing spline models
This article studies local and global inference for smoothing spline estimation in a unified asymptotic framework. We first introduce a new technical tool called functional Bahadur representation, which significantly generalizes the traditional Bahadur representation in parametric models, that is, Bahadur [Ann. Inst. Statist. Math. 37 (1966) 577-580]. Equipped with this tool, we develop four interconnected procedures for inference: (i) pointwise confidence interval; (ii) local likelihood ratio testing; (iii) simultaneous confidence band; (iv) global likelihood ratio testing. In particular, our confidence intervals are proved to be asymptotically valid at any point in the support, and they are shorter on average than the Bayesian confidence intervals proposed by Wahba [J. R. Stat. Soc. Ser. B Stat. Methodol. 45 (1983) 133-150] and Nychka [J. Amer. Statist. Assoc. 83 (1988) 1134-1143]. We also discuss a version of the Wilks phenomenon arising from local/global likelihood ratio testing. It is also worth noting that our simultaneous confidence bands are the first ones applicable to general quasi-likelihood models. Furthermore, issues relating to optimality and efficiency are carefully addressed. As a by-product, we discover a surprising relationship between periodic and nonperiodic smoothing splines in terms of inference.