Regression
A Likelihood Ratio Framework for High Dimensional Semiparametric Regression
Ning, Yang, Zhao, Tianqi, Liu, Han
We propose a likelihood ratio based inferential framework for high dimensional semiparametric generalized linear models. This framework addresses a variety of challenging problems in high dimensional data analysis, including incomplete data, selection bias, and heterogeneous multitask learning. Our work has three main contributions. (i) We develop a regularized statistical chromatography approach to infer the parameter of interest under the proposed semiparametric generalized linear model without the need of estimating the unknown base measure function. (ii) We propose a new framework to construct post-regularization confidence regions and tests for the low dimensional components of high dimensional parameters. Unlike existing post-regularization inferential methods, our approach is based on a novel directional likelihood. In particular, the framework naturally handles generic regularized estimators with nonconvex penalty functions and it can be used to infer least false parameters under misspecified models. (iii) We develop new concentration inequalities and normal approximation results for U-statistics with unbounded kernels, which are of independent interest. We demonstrate the consequences of the general theory by using an example of missing data problem. Extensive simulation studies and real data analysis are provided to illustrate our proposed approach.
Kernel Additive Principal Components
Tan, Xin Lu, Buja, Andreas, Ma, Zongming
Additive principal components (APCs for short) are a nonlinear generalization of linear principal components. We focus on smallest APCs to describe additive nonlinear constraints that are approximately satisfied by the data. Thus APCs fit data with implicit equations that treat the variables symmetrically, as opposed to regression analyses which fit data with explicit equations that treat the data asymmetrically by singling out a response variable. We propose a regularized data-analytic procedure for APC estimation using kernel methods. In contrast to existing approaches to APCs that are based on regularization through subspace restriction, kernel methods achieve regularization through shrinkage and therefore grant distinctive flexibility in APC estimation by allowing the use of infinite-dimensional functions spaces for searching APC transformation while retaining computational feasibility. To connect population APCs and kernelized finite-sample APCs, we study kernelized population APCs and their associated eigenproblems, which eventually lead to the establishment of consistency of the estimated APCs. Lastly, we discuss an iterative algorithm for computing kernelized finite-sample APCs.
GAP Safe screening rules for sparse multi-task and multi-class models
Ndiaye, Eugene, Fercoq, Olivier, Gramfort, Alexandre, Salmon, Joseph
High dimensional regression benefits from sparsity promoting regularizations. Screening rules leverage the known sparsity of the solution by ignoring some variables in the optimization, hence speeding up solvers. When the procedure is proven not to discard features wrongly the rules are said to be \emph{safe}. In this paper we derive new safe rules for generalized linear models regularized with $\ell_1$ and $\ell_1/\ell_2$ norms. The rules are based on duality gap computations and spherical safe regions whose diameters converge to zero. This allows to discard safely more variables, in particular for low regularization parameters. The GAP Safe rule can cope with any iterative solver and we illustrate its performance on coordinate descent for multi-task Lasso, binary and multinomial logistic regression, demonstrating significant speed ups on all tested datasets with respect to previous safe rules.
A Random Forest Guided Tour
The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad-hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.
Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models
Naeini, Mahdi Pakdaman, Cooper, Gregory F.
Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called \textit{ensemble of near isotonic regression} (ENIR). The method can be considered as an extension of BBQ, a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression. ENIR is designed to address the key limitation of isotonic regression which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be combined with many existing classification models. We demonstrate the performance of ENIR on synthetic and real datasets for the commonly used binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is $O(N \log N)$ time, where $N$ is the number of samples.
Sparse Nonlinear Regression: Parameter Estimation and Asymptotic Inference
Yang, Zhuoran, Wang, Zhaoran, Liu, Han, Eldar, Yonina C., Zhang, Tong
We study parameter estimation and asymptotic inference for sparse nonlinear regression. More specifically, we assume the data are given by $y = f( x^\top \beta^* ) + \epsilon$, where $f$ is nonlinear. To recover $\beta^*$, we propose an $\ell_1$-regularized least-squares estimator. Unlike classical linear regression, the corresponding optimization problem is nonconvex because of the nonlinearity of $f$. In spite of the nonconvexity, we prove that under mild conditions, every stationary point of the objective enjoys an optimal statistical rate of convergence. In addition, we provide an efficient algorithm that provably converges to a stationary point. We also access the uncertainty of the obtained estimator. Specifically, based on any stationary point of the objective, we construct valid hypothesis tests and confidence intervals for the low dimensional components of the high-dimensional parameter $\beta^*$. Detailed numerical results are provided to back up our theory.
A Sparse and Adaptive Prior for Time-Dependent Model Parameters
Yogatama, Dani, Routledge, Bryan R., Smith, Noah A.
We consider the scenario where the parameters of a probabilistic model are expected to vary over time. We construct a novel prior distribution that promotes sparsity and adapts the strength of correlation between parameters at successive timesteps, based on the data. We derive approximate variational inference procedures for learning and prediction with this prior. We test the approach on two tasks: forecasting financial quantities from relevant text, and modeling language contingent on time-varying financial measurements.
Supervised Learning for Dynamical System Learning
Hefny, Ahmed, Downey, Carlton, Gordon, Geoffrey
Recently there has been substantial interest in spectral methods for learning dynamical systems. These methods are popular since they often offer a good tradeoff between computational and statistical efficiency. Unfortunately, they can be difficult to use and extend in practice: e.g., they can make it difficult to incorporate prior information such as sparsity or structure. To address this problem, we present a new view of dynamical system learning: we show how to learn dynamical systems by solving a sequence of ordinary supervised learning problems, thereby allowing users to incorporate prior knowledge via standard techniques such as L1 regularization. Many existing spectral methods are special cases of this new framework, using linear regression as the supervised learner. We demonstrate the effectiveness of our framework by showing examples where nonlinear regression or lasso let us learn better state representations than plain linear regression does; the correctness of these instances follows directly from our general analysis.
Modeling Individual Differences through Frequent Pattern Mining on Role-Playing Game Actions
Chen, Zhengxing (Northeastern University) | Nasr, Magy Seif El (Northeastern University) | Canossa, Alessandro (Northeastern University) | Badler, Jeremy (Northeastern University) | Tignor, Stefanie (Northeastern University) | Colvin, Randy (Northeastern University)
There has been much work on player modeling using game behavioral data collected. Many of the previous research projects that targeted this goal used aggregate game statistics as features to develop behavior models using both statistical and machine learning techniques. While existing methods have already led to interesting findings, we suspect that aggregated features discard valuable information such as temporal or sequential patterns, which may be important in deciphering information about decisionmaking, problem solving, or individual differences. Such sequential information is critical to analyze player behaviors especially in role-playing games (RPG) where players can face ample choices, experience different contexts, behave freely with individual propensities but possibly end up with similar aggregated statistics (e.g., levels, time spent). In this paper we intend to develop and apply a modeling technique that takes into consideration sequential patters to decipher individual differences in playing a Role Playing Game (RPG) game. Using an RPG with multiple affordances, we designed an experiment collecting granular in-game behaviors of 64 players. Using closed sequential pattern mining and logistic regression, we developed a model that uses gameplay action sequences to predict the real world characteristics, including gender, game play expertise and five personality traits (as defined by psychology). The results show that game expertise is a dominant factor that impacts in-game behaviors. The contribution of this paper is the algorithms we developed combined with a validation procedure to determine the reliability and validity of the results and the results themselves.
Predicting Purchase Decisions in Mobile Free-to-Play Games
Sifa, Rafet (Fraunhofer IAIS) | Hadiji, Fabian (TU Dortmund, goedle.io) | Runge, Julian (Wooga GmbH) | Drachen, Anders (Aalborg University) | Kersting, Kristian (TU Dortmund) | Bauckhage, Christian (Fraunhofer IAIS)
Mobile digital games are dominantly released under the freemium business model, but only a small fraction of the players makes any purchases. The ability to predict who will make a purchase enables optimization of marketing efforts, and tailoring customer relationship management to the specific user's profile. Here this challenge is addressed via two models for predicting purchasing players, using a 100,000 player dataset: 1) A classification model focused on predicting whether a purchase will occur or not. 2) a regression model focused on predicting the number of purchases a user will make. Both models are presented within a decision and regression tree framework for building rules that are actionable by companies. To the best of our knowledge, this is the first study investigating purchase decisions in freemium mobile products from a user behavior perspective and adopting behavior-driven learning approaches to this problem.