Regression
Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data
Culotta, Aron, Ravi, Nirmal Kumar, Cutler, Jennifer
Understanding the demographics of users of online social networks has important applications for health, marketing, and public messaging. Whereas most prior approaches rely on a supervised learning approach, in which individual users are labeled with demographics for training, we instead create a distantly labeled dataset by collecting audience measurement data for 1,500 websites (e.g., 50% of visitors to gizmodo.com are estimated to have a bachelor's degree). We then fit a regression model to predict these demographics from information about the followers of each website on Twitter. Using patterns derived both from textual content and the social network of each user, our final model produces an average held-out correlation of .77 across seven different variables (age, gender, education, ethnicity, income, parental status, and political preference). We then apply this model to classify individual Twitter users by ethnicity, gender, and political preference, finding performance that is surprisingly competitive with a fully supervised approach.
GAP Safe Screening Rules for Sparse-Group-Lasso
Ndiaye, Eugene, Fercoq, Olivier, Gramfort, Alexandre, Salmon, Joseph
In high dimensional settings, sparse structures are crucial for efficiency, either in term of memory, computation or performance. In some contexts, it is natural to handle more refined structures than pure sparsity, such as for instance group sparsity. Sparse-Group Lasso has recently been introduced in the context of linear regression to enforce sparsity both at the feature level and at the group level. We adapt to the case of Sparse-Group Lasso recent safe screening rules that discard early in the solver irrelevant features/groups. Such rules have led to important speed-ups for a wide range of iterative methods. Thanks to dual gap computations, we provide new safe screening rules for Sparse-Group Lasso and show significant gains in term of computing time for a coordinate descent implementation.
Semi-parametric Order-based Generalized Multivariate Regression
Kharratzadeh, Milad, Coates, Mark
In this paper, we consider a generalized multivariate regression problem where the responses are monotonic functions of linear transformations of predictors. We propose a semi-parametric algorithm based on the ordering of the responses which is invariant to the functional form of the transformation function. We prove that our algorithm, which maximizes the rank correlation of responses and linear transformations of predictors, is a consistent estimator of the true coefficient matrix. We also identify the rate of convergence and show that the squared estimation error decays with a rate ofo(1/ n). We then propose a greedy algorithm to maximize the highly non-smooth objective function of our model and examine its performance through extensive simulations. Finally, we compare our algorithm with traditional multivariate regression algorithms over synthetic and real data. Let us rewrite (2) in matrix form: Y n q U (X n pB p q E n q), (3) where p is the number of predictors,q is the number of responses, andn denotes the number of instances.x T i, y T i, and T i correspond, respectively, to thei -th rows ofX, Y, and E . To findB, we propose to solve: B n arg max B 1 n( q 2) n i 1 q j 1 q k 11 (y ij y ik)1 (x T i b j x T i b k) ︸ ︷︷ ︸ S n ( B), (4) whereb j denotes the j -th column ofB . The intuition behind this formulation is that sinceU is increasing and the error is i.i.d. and independent ofx, when we havex T i b j x T i b k, it is more likely to havey ij y ik than the other way around. The term in the summation is zero forj k . Maximizing S n(B) is equivalent to maximizing the average rank correlation ofy T i and x T i B since 2 S n(B) 1 corresponds to the average over then observations of the Kendall rank correlation betweeny T i and x T i B . 2. Motivating Examples and Related Work 2.1. Learning from nonlinear measurements In many practical settings, the measurements or observations are noisy nonlinear functions of a linear transformation of an underlying signal. This could be due to the uncertainties and non-linearities of the measurement device or arise from the experimental design (e.g., censoring or quantization).
Lass-0: sparse non-convex regression by local search
Herlands, William, De-Arteaga, Maria, Neill, Daniel, Dubrawski, Artur
We compute approximate solutions to L0 regularized linear regression using L1 regularization, also known as the Lasso, as an initialization step. Our algorithm, the Lass-0 ("Lass-zero"), uses a computationally efficient stepwise search to determine a locally optimal L0 solution given any L1 regularization solution. We present theoretical results of consistency under orthogonality and appropriate handling of redundant features. Empirically, we use synthetic data to demonstrate that Lass-0 solutions are closer to the true sparse support than L1 regularization models. Additionally, in real-world data Lass-0 finds more parsimonious solutions than L1 regularization while maintaining similar predictive accuracy.
Secure Approximation Guarantee for Cryptographically Private Empirical Risk Minimization
Takada, Toshiyuki, Hanada, Hiroyuki, Yamada, Yoshiji, Sakuma, Jun, Takeuchi, Ichiro
Privacy concern has been increasingly important in many machine learning (ML) problems. We study empirical risk minimization (ERM) problems under secure multi-party computation (MPC) frameworks. Main technical tools for MPC have been developed based on cryptography. One of limitations in current cryptographically private ML is that it is computationally intractable to evaluate non-linear functions such as logarithmic functions or exponential functions. Therefore, for a class of ERM problems such as logistic regression in which non-linear function evaluations are required, one can only obtain approximate solutions. In this paper, we introduce a novel cryptographically private tool called secure approximation guarantee (SAG) method. The key property of SAG method is that, given an arbitrary approximate solution, it can provide a non-probabilistic assumption-free bound on the approximation quality under cryptographically secure computation framework. We demonstrate the benefit of the SAG method by applying it to several problems including a practical privacy-preserving data analysis task on genomic and clinical information.
Deep Gaussian Processes for Regression using Approximate Expectation Propagation
Bui, Thang D., Hernández-Lobato, Daniel, Li, Yingzhen, Hernández-Lobato, José Miguel, Turner, Richard E.
Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations of Gaussian processes (GPs) and are formally equivalent to neural networks with multiple, infinitely wide hidden layers. DGPs are nonparametric probabilistic models and as such are arguably more flexible, have a greater capacity to generalise, and provide better calibrated uncertainty estimates than alternative deep models. This paper develops a new approximate Bayesian learning scheme that enables DGPs to be applied to a range of medium to large scale regression problems for the first time. The new method uses an approximate Expectation Propagation procedure and a novel and efficient extension of the probabilistic backpropagation algorithm for learning. We evaluate the new method for non-linear regression on eleven real-world datasets, showing that it always outperforms GP regression and is almost always better than state-of-the-art deterministic and sampling-based approximate inference methods for Bayesian neural networks. As a by-product, this work provides a comprehensive analysis of six approximate Bayesian methods for training neural networks.
Wikipedia in the Tourism Industry: Forecasting Demand and Modeling Usage Behavior
Khadivi, Pejman (Virginia Polytechnic Institute and State University) | Ramakrishnan, Naren (Virginia Polytechnic Institute and State University)
Due to the economic and social impacts of tourism, both private and public sectors are interested in precisely forecasting the tourism demand volume in a timely manner. With recent advances in social networks, more people use online resources to plan their future trips. In this paper we explore the application of Wikipedia usage trends (WUTs) in tourism analysis. We propose a framework that deploys WUTs for forecasting the tourism demand of Hawaii. We also propose a data-driven approach, using WUTs, to estimate the behavior of tourists when they plan their trips.
A statistical perspective of sampling scores for linear regression
Chen, Siheng, Varma, Rohan, Singh, Aarti, Kovačević, Jelena
In this paper, we consider a statistical problem of learning a linear model from noisy samples. Existing work has focused on approximating the least squares solution by using leverage-based scores as an importance sampling distribution. However, no finite sample statistical guarantees and no computationally efficient optimal sampling strategies have been proposed. To evaluate the statistical properties of different sampling strategies, we propose a simple yet effective estimator, which is easy for theoretical analysis and is useful in multitask linear regression. We derive the exact mean square error of the proposed estimator for any given sampling scores. Based on minimizing the mean square error, we propose the optimal sampling scores for both estimator and predictor, and show that they are influenced by the noise-to-signal ratio. Numerical simulations match the theoretical analysis well.
Feature Representation for ICU Mortality
Good predictors of ICU Mortality have the potential to identify high-risk patients earlier, improve ICU resource allocation, or create more accurate population-level risk models. Machine learning practitioners typically make choices about how to represent features in a particular model, but these choices are seldom evaluated quantitatively. This study compares the performance of different representations of clinical event data from MIMIC II in a logistic regression model to predict 36-hour ICU mortality. The most common representations are linear (normalized counts) and binary (yes/no). These, along with a new representation termed "hill", are compared using both L1 and L2 regularization. Results indicate that the introduced "hill" representation outperforms both the binary and linear representations, the hill representation thus has the potential to improve existing models of ICU mortality.
MPBART - Multinomial Probit Bayesian Additive Regression Trees
Kindo, Bereket P., Wang, Hao, Peña, Edsel A.
Multinomial probit (MNP) model for discrete choice modeling is often used in economics, market research, political sciences and transportation. It models the choices made by agents given their demographic characteristics and/or the features of the K 2 available choice alternatives. Examples include the study of consumer's purchasing behavior (e.g., McCulloch et al. (2000); Imai and van Dyk (2005)); voting behavior in multi-party elections (e.g., Quinn et al. (1999)); and choice of different modes of transportation (e.g., Bolduc (1999)). Details of the MNP model in which choices depend on predictors in a linear fashion is studied in McFadden et al.(1973); McFadden(1989); Keane(1992); McCulloch and Rossi (1994); Nobile (1998); McCulloch et al. (2000); Imai and van Dyk (2005); Train (2009); Burgette and Nordheim (2012) among others. Among widely used multinomial choice modeling procedures are the multinomial logit model (e.g., McFadden et al. (1973); Train (2009)) and multinomial probit model (e.g., McFadden (1989); McCulloch and Rossi (1994); Imai and van Dyk (2005)). The former relies on an assumption that a choice outcome is independent of removal (or introduction) of an irrelevant choice alternative while the latter including MPBART does not make this restrictive assumption.