Regression
Statistical Inference Using SGD
Li, Tianyang (University of Texas at Austin) | Liu, Liu (University of Texas at Austin) | Kyrillidis, Anastasios ( IBM T.J. Watson Research Center, Yorktown Heights ) | Caramanis, Constantine (University of Texas at Austin)
We present a novel method for frequentist statistical inference in M-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. To show the merits of our scheme, we apply it to both synthetic and real data sets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.
A Framework for Multistream Regression With Direct Density Ratio Estimation
Haque, Ahsanul (University of Texas at Dallas) | Tao, Hemeng (University of Texas at Dallas) | Chandra, Swarup (University of Texas at Dallas) | Liu, Jie (University of Texas at Dallas ) | Khan, Latifur (University of Computer Science at Dallas)
Regression over a stream of data is challenging due to unbounded data size and non-stationary distribution over time. Typically, a traditional supervised regression model over a data stream is trained on data instances occurring within a short time period by assuming a stationary distribution. This model is later used to predict value of response-variable in future instances. Over time, the model may degrade in performance due to changes in data distribution among incoming data instances. Updating the model for change adaptation requires true value for every recent data instances, which is scarce in practice. To overcome this issue, recent studies have employed techniques that sample fewer instances to be used for model retraining. Yet, this may introduce sampling bias that adversely affects the model performance. In this paper, we study the regression problem over data streams in a novel setting. We consider two independent, yet related, non-stationary data streams, which are referred to as the source and the target stream. The target stream continuously generates data instances whose value of response variable is unknown. The source stream, however, continuously generates data instances along with corresponding value for the response-variable, and has a biased data distribution with respect to the target stream. We refer to the problem of using a model trained on the biased source stream to predict the response-variableโs value in data instances occurring on the target stream as Multistream Regression. In this paper, we describe a framework for multistream regression that simultaneously overcomes distribution bias and detects change in data distribution represented by the two streams over time using a Gaussian kernel model. We analyze the theoretical properties of the proposed approach and empirically evaluate it on both real-world and synthetic data sets. Importantly, our results indicate superior performance by the framework compared to other baseline regression methods.
Human Guided Linear Regression With Feature-Level Constraints
Gress, Aubrey (University of California, Davis) | Davidson, Ian (University of California, Davis)
Linear regression methods are commonly used by both researchers and data scientists due to their interpretability and their reduced likelihood of overfitting. However, these methods can still perform poorly if little labeled training data is available. Typical methods used to overcome a lack of labeled training data somehow involve exploiting an outside source of labeled data or large amounts of unlabeled data. This includes areas such as active learning, semi-supervised learning and transfer learning, but in many domains these approaches are not always applicable because they require either a mechanism to label data, large amounts of unlabeled data or additional sources of sufficiently related data. In this paper we explore an alternative, non-data centric approach. We allow the user to guide the learning system through three forms of feature-level guidance which constrain the parameters of the regression function. Such guidance is unlikely to be perfectly accurate, so we derive methods which are robust to some amounts of noise, a property we formally prove for one of our methods.
Margin Based PU Learning
Gong, Tieliang (Xi'an Jiaotong University) | Wang, Guangtao (University of Michigan) | Ye, Jieping (University of Michigan) | Xu, Zongben (Xi'an Jiaotong University) | Lin, Ming (University of Michigan)
The PU learning problem concerns about learning from positive and unlabeled data. A popular heuristic is to iteratively enlarge training set based on some margin-based criterion. However, little theoretical analysis has been conducted to support the success of these heuristic methods. In this work, we show that not all margin-based heuristic rules are able to improve the learned classifiers iteratively. We find that a so-called large positive margin oracle is necessary to guarantee the success of PU learning. Under this oracle, a provable positive-margin based PU learning algorithm is proposed for linear regression and classification under the truncated Gaussian distributions. The proposed algorithm is able to reduce the recovering error geometrically proportional to the positive margin. Extensive experiments on real-world datasets verify our theory and the state-of-the-art performance of the proposed PU learning algorithm.
Incomplete Label Multi-Task Ordinal Regression for Spatial Event Scale Forecasting
Gao, Yuyang (George Mason University) | Zhao, Liang (George Mason University)
Event scales are commonly used by practitioners to gauge subjective feelings on the magnitude and significance of social events. For example, the Centers for Disease Control and Prevention (CDC) utilizes a 10-level scale to distinguish the severity of flu outbreaks and governments typically categorize violent outbreaks based on their intensity as reflected in multiple aspects. Effective forecasting of future event scales can be used qualitatively to determine reasonable resource allocations and facilitate accurate proactive actions by practitioners. Existing spatial event forecasting methods typically focus on the occurrence of events rather than their ordinal event scales as this is very challenging in several respects, including 1) the ordinal nature of the event scale, 2) the spatial heterogeneity of event scaling in different geo-locations, 3) the incompleteness of scale label data for some spatial locations, and 4) the spatial correlation of event scale patterns. In order to address all these challenges concurrently, a MultI-Task Ordinal Regression (MITOR) framework is proposed to effectively forecast the scale of future events. Our model enforces similar feature sparsity patterns for different tasks while preserving the heterogeneity in their scale patterns. In addition, based on the first law of geography, we proposed to enforce spatially-closed tasks to share similar scale patterns with theoretical guarantees. Optimizing the proposed model amounts to a new non-convex and non-smooth problem with an isotonicity constraint, which is then solved by our new algorithm based on ADMM and dynamic programming. Extensive experiments on ten real-world datasets demonstrate the effectiveness and efficiency of the proposed model.
Parameter-Free Centralized Multi-Task Learning for Characterizing Developmental Sex Differences in Resting State Functional Connectivity
Zhu, Xiaofeng (University of Pennsylvania) | Li, Hongming (University of Pennsylvania) | Fan, Yong (University of Pennsylvania)
In contrast to most existing studies that typically characterize the developmental sex differences using analysis of variance or equivalently multiple linear regression, we present a parameter-free centralized multi-task learning method to identify sex specific and common resting state functional connectivity (RSFC) patterns underlying the brain development based on resting state functional MRI (rs-fMRI) data. Specifically, we design a novel multi-task learning model to characterize sex specific and common RSFC patterns in an age prediction framework by regarding the age prediction for males and females as separate tasks. Moreover, the importance of each task and the balance of these two patterns, respectively, are automatically learned in order to make the multi-task learning robust as well as free of tunable parameters, i.e., parameter-free for short. Our experimental results on synthetic datasets verified the effectiveness of our method with respect to prediction performance, and experimental results on rs-fMRI scans of 1041 subjects (651 males) of the Philadelphia Neurodevelopmental Cohort (PNC) showed that our method could improve the age prediction on average by 5.82% with statistical significance than the best alternative methods under comparison, in addition to characterizing the developmental sex differences in RSFC patterns.
Fair Inference on Outcomes
Nabi, Razieh (Johns Hopkins University) | Shpitser, Ilya (Johns Hopkins University)
In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are "sensitive," in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes (Pearl 2009). A fair outcome model can then be learned by solving a constrained optimization problem. We discuss a number of complications that arise in classical statistical inference due to this view and provide workarounds based on recent work in causal and semi-parametric inference.
Large Scale Constrained Linear Regression Revisited: Faster Algorithms via Preconditioning
Wang, Di (State University of New York at Buffalo) | Xu, Jinhui (State University of New York at Buffalo)
In this paper, we revisit the large-scale constrained linear regression problem and propose faster methods based on some recent developments in sketching and optimization. Our algorithms combine (accelerated) mini-batch SGD with a new method called two-step preconditioning to achieve an approximate solution with a time complexity lower than that of the state-of-the-art techniques for the low precision case. Our idea can also be extended to the high precision case, which gives an alternative implementation to the Iterative Hessian Sketch (IHS) method with significantly improved time complexity. Experiments on benchmark and synthetic datasets suggest that our methods indeed outperform existing ones considerably in both the low and high precision cases.
A Regression Approach for Modeling Games With Many Symmetric Players
Wiedenbeck, Bryce (Swarthmore College) | Yang, Fengjun (Swarthmore College) | Wellman, Michael P. (University of Michigan)
We exploit player symmetry to formulate the representation of large normal-form games as a regression task. This formulation allows arbitrary regression methods to be employed in in estimating utility functions from a small subset of the game's outcomes. We demonstrate the applicability both neural networks and Gaussian process regression, but focus on the latter. Once utility functions are learned, computing Nash equilibria requires estimating expected payoffs of pure-strategy deviations from mixed-strategy profiles. Computing these expectations exactly requires an infeasible sum over the full payoff matrix, so we propose and test several approximation methods. Three of these are simple and generic, applicable to any regression method and games with any number of player roles. However, the best performance is achieved by a continuous integral that approximates the summation, which we formulate for the specific case of fully-symmetric games learned by Gaussian process regression with a radial basis function kernel. We demonstrate experimentally that the combination of learned utility functions and expected payoff estimation allows us to efficiently identify approximate equilibria of large games using sparse payoff data.
DepthLGP: Learning Embeddings of Out-of-Sample Nodes in Dynamic Networks
Ma, Jianxin (Tsinghua University) | Cui, Peng (Tsinghua University) | Zhu, Wenwu (Tsinghua University)
Network embedding algorithms to date are primarily designed for static networks, where all nodes are known before learning. How to infer embeddings for out-of-sample nodes, i.e. nodes that arrive after learning, remains an open problem. The problem poses great challenges to existing methods, since the inferred embeddings should preserve intricate network properties such as high-order proximity, share similar characteristics (i.e. be of a homogeneous space) with in-sample node embeddings, and be of low computational cost. To overcome these challenges, we propose a Deeply Transformed High-order Laplacian Gaussian Process (DepthLGP) method to infer embeddings for out-of-sample nodes. DepthLGP combines the strength of nonparametric probabilistic modeling and deep learning. In particular, we design a high-order Laplacian Gaussian process (hLGP) to encode network properties, which permits fast and scalable inference. In order to further ensure homogeneity, we then employ a deep neural network to learn a nonlinear transformation from latent states of the hLGP to node embeddings. DepthLGP is general, in that it is applicable to embeddings learned by any network embedding algorithms. We theoretically prove the expressive power of DepthLGP, and conduct extensive experiments on real-world networks. Empirical results demonstrate that our approach can achieve significant performance gain over existing approaches.