Regression
Minimum Distance Estimation for Robust High-Dimensional Regression
Lozano, Aurélie C., Meinshausen, Nicolai
We propose a minimum distance estimation method for robust regression in sparse high-dimensional settings. The traditional likelihood-based estimators lack resilience against outliers, a critical issue when dealing with high-dimensional noisy data. Our method, Minimum Distance Lasso (MD-Lasso), combines minimum distance functionals, customarily used in nonparametric estimation for their robustness, with l1-regularization for high-dimensional regression. The geometry of MD-Lasso is key to its consistency and robustness. The estimator is governed by a scaling parameter that caps the influence of outliers: the loss per observation is locally convex and close to quadratic for small squared residuals, and flattens for squared residuals larger than the scaling parameter. As the parameter approaches infinity, the estimator becomes equivalent to least-squares Lasso. MD-Lasso enjoys fast convergence rates under mild conditions on the model error distribution, which hold for any of the solutions in a convexity region around the true parameter and in certain cases for every solution. Remarkably, a first-order optimization method is able to produce iterates very close to the consistent solutions, with geometric convergence and regardless of the initialization. A connection is established with re-weighted least-squares that intuitively explains MD-Lasso robustness. The merits of our method are demonstrated through simulation and eQTL data analysis.
An Agent Design for Repeated Negotiation and Information Revelation with People
Peled, Noam (Bar Ilan University) | Gal, Ya' (Ben-Gurion University) | akov (Kobi) (Bar Ilan University) | Kraus, Sarit
Many negotiations in the real world are characterized by incomplete information, and participants' success depends on their ability to reveal information in a way that facilitates agreement without compromising the individual gains of agents. This paper presents a novel agent design for repeated negotiation in incomplete information settings that learns to reveal information strategically during the negotiation process. The agent used classical machine learning techniques to predict how people make and respond to offers during the negotiation, how they reveal information and their response to potential revelation actions by the agent. The agent was evaluated empirically in an extensive empirical study spanning hundreds of human subjects. Results show that the agent was able to outperform people. In particular, it learned (1) to make offers that were beneficial to people while not compromising its own benefit; (2) to incrementally reveal information to people in a way that increased its expected performance. The approach generalizes to new settings without the need to acquire additional data. This work demonstrates the efficacy of combining machine learning with opponent modeling techniques towards the design of computer agents for negotiating with people in settings of incomplete information.
Achieving greater Explanatory Power and Forecasting Accuracy with Non-uniform spread Fuzzy Linear Regression
Fuzzy regression models have been applied to several Operations Research applications viz., forecasting and prediction. Earlier works on fuzzy regression analysis obtain crisp regression coefficients for eliminating the problem of increasing spreads for the estimated fuzzy responses as the magnitude of the independent variable increases. But they cannot deal with the problem of non-uniform spreads. In this work, a three-phase approach is discussed to construct the fuzzy regression model with non-uniform spreads to deal with this problem. The first phase constructs the membership functions of the least-squares estimates of regression coefficients based on extension principle to completely conserve the fuzziness of observations. They are then defuzzified by the centre of area method to obtain crisp regression coefficients in the second phase. Finally, the error terms of the method are determined by setting each estimated spread equal to its corresponding observed spread. The Tagaki-Sugeno inference system is used for improving the accuracy of forecasts. The simulation example demonstrates the strength of fuzzy linear regression model in terms of higher explanatory power and forecasting performance.
Learning Mixed Graphical Models
Lee, Jason D., Hastie, Trevor J.
We consider the problem of learning the structure of a pairwise graphical model over continuous and discrete variables. We present a new pairwise model for graphical models with both continuous and discrete variables that is amenable to structure learning. In previous work, authors have considered structure learning of Gaussian graphical models and structure learning of discrete models. Our approach is a natural generalization of these two lines of work to the mixed case. The penalization scheme involves a novel symmetric use of the group-lasso norm and follows naturally from a particular parametrization of the model.
A Statistical Perspective on Algorithmic Leveraging
Ma, Ping, Mahoney, Michael W., Yu, Bin
One popular method for dealing with large-scale data sets is sampling. For example, by using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales rows/columns of data matrices to reduce the data size before performing computations on the subproblem. This method has been successful in improving computational efficiency of algorithms for matrix problems such as least-squares approximation, least absolute deviations approximation, and low-rank matrix approximation. Existing work has focused on algorithmic issues such as worst-case running times and numerical issues associated with providing high-quality implementations, but none of it addresses statistical aspects of this method. In this paper, we provide a simple yet effective framework to evaluate the statistical properties of algorithmic leveraging in the context of estimating parameters in a linear regression model with a fixed number of predictors. We show that from the statistical perspective of bias and variance, neither leverage-based sampling nor uniform sampling dominates the other. This result is particularly striking, given the well-known result that, from the algorithmic perspective of worst-case analysis, leverage-based sampling provides uniformly superior worst-case algorithmic results, when compared with uniform sampling. Based on these theoretical results, we propose and analyze two new leveraging algorithms. A detailed empirical evaluation of existing leverage-based methods as well as these two new methods is carried out on both synthetic and real data sets. The empirical results indicate that our theory is a good predictor of practical performance of existing and new leverage-based algorithms and that the new algorithms achieve improved performance.
Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers
Ateniese, Giuseppe, Felici, Giovanni, Mancini, Luigi V., Spognardi, Angelo, Villani, Antonio, Vitali, Domenico
Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training sets. Although ML algorithms are known and publicly released, training sets may not be reasonably ascertainable and, indeed, may be guarded as trade secrets. While much research has been performed about the privacy of the elements of training sets, in this paper we focus our attention on ML classifiers and on the statistical information that can be unconsciously or maliciously revealed from them. We show that it is possible to infer unexpected but useful information from ML classifiers. In particular, we build a novel meta-classifier and train it to hack other classifiers, obtaining meaningful information about their training sets. This kind of information leakage can be exploited, for example, by a vendor to build more effective classifiers or to simply acquire trade secrets from a competitor's apparatus, potentially violating its intellectual property rights.
A lasso for hierarchical interactions
Bien, Jacob, Taylor, Jonathan, Tibshirani, Robert
We add a set of convex constraints to the lasso to produce sparse interaction models that honor the hierarchy restriction that an interaction only be included in a model if one or both variables are marginally important. We give a precise characterization of the effect of this hierarchy constraint, prove that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint. We distinguish between parameter sparsity - the number of nonzero coefficients - and practical sparsity - the number of raw variables one must measure to make a new prediction. Hierarchy focuses on the latter, which is more closely tied to important data collection concerns such as cost, time and effort. We develop an algorithm, available in the R package hierNet, and perform an empirical study of our method.
Joint estimation of sparse multivariate regression and conditional graphical models
Multivariate regression model is a natural generalization of the classical univari- ate regression model for fitting multiple responses. In this paper, we propose a high- dimensional multivariate conditional regression model for constructing sparse estimates of the multivariate regression coefficient matrix that accounts for the dependency struc- ture among the multiple responses. The proposed method decomposes the multivariate regression problem into a series of penalized conditional log-likelihood of each response conditioned on the covariates and other responses. It allows simultaneous estimation of the sparse regression coefficient matrix and the sparse inverse covariance matrix. The asymptotic selection consistency and normality are established for the diverging dimension of the covariates and number of responses. The effectiveness of the pro- posed method is also demonstrated in a variety of simulated examples as well as an application to the Glioblastoma multiforme cancer data.
Stability of Multi-Task Kernel Regression Algorithms
Audiffren, Julien, Kadri, Hachem
We study the stability properties of nonlinear multi-task regression in reproducing Hilbert spaces with operator-valued kernels. Such kernels, a.k.a. multi-task kernels, are appropriate for learning prob- lems with nonscalar outputs like multi-task learning and structured out- put prediction. We show that multi-task kernel regression algorithms are uniformly stable in the general case of infinite-dimensional output spaces. We then derive under mild assumption on the kernel generaliza- tion bounds of such algorithms, and we show their consistency even with non Hilbert-Schmidt operator-valued kernels . We demonstrate how to apply the results to various multi-task kernel regression methods such as vector-valued SVR and functional ridge regression.
Spectral Experts for Estimating Mixtures of Linear Regressions
Chaganty, Arun Tejasvi, Liang, Percy
Discriminative latent-variable models are typically learned using EM or gradient-based optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative latent-variable model. Our approach relies on a low-rank linear regression to recover a symmetric tensor, which can be factorized into the parameters using a tensor power method. We prove rates of convergence for our estimator and provide an empirical evaluation illustrating its strengths relative to local optimization (EM).