Regression
The Secrets of Machine Learning: Ten Things You Wish You Had Known Earlier to be More Effective at Data Analysis
Rudin, Cynthia, Carlson, David
Despite the widespread usage of machine learning throughout organizations, there are some key principles that are commonly missed. In particular: 1) There are at least four main families for supervised learning: logical modeling methods, linear combination methods, case-based reasoning methods, and iterative summarization methods. 2) For many application domains, almost all machine learning methods perform similarly (with some caveats). Deep learning methods, which are the leading technique for computer vision problems, do not maintain an edge over other methods for most problems (and there are reasons why). 3) Neural networks are hard to train and weird stuff often happens when you try to train them. 4) If you don't use an interpretable model, you can make bad mistakes. 5) Explanations can be misleading and you can't trust them. 6) You can pretty much always find an accurate-yet-interpretable model, even for deep neural networks. 7) Special properties such as decision making or robustness must be built in, they don't happen on their own. 8) Causal inference is different than prediction (correlation is not causation). 9) There is a method to the madness of deep neural architectures, but not always. 10) It is a myth that artificial intelligence can do anything.
An interpretable machine learning framework for modelling human decision behavior
Guo, Mengzhuo, Zhang, Qingpeng, Liao, Xiuwu, Chen, Youhua
Machine learning has recently been widely adopted to address the managerial decision making problems. However, there is a trade-off between performance and interpretability. Full complexity models (such as neural network-based models) are non-traceable black-box, whereas classic interpretable models (such as logistic regression) are usually simplified with lower accuracy. This trade-off limits the application of state-of-the-art machine learning models in management problems, which requires high prediction performance, as well as the understanding of individual attributes' contributions to the model outcome. Multiple criteria decision aiding (MCDA) is a family of interpretable approaches to depicting the rationale of human decision behavior. It is also limited by strong assumptions (e.g. preference independence). In this paper, we propose an interpretable machine learning approach, namely Neural Network-based Multiple Criteria Decision Aiding (NN-MCDA), which combines an additive MCDA model and a fully-connected multilayer perceptron (MLP) to achieve good performance while preserving a certain degree of interpretability. NN-MCDA has a linear component (in an additive form of a set of polynomial functions) to capture the detailed relationship between individual attributes and the prediction, and a nonlinear component (in a standard MLP form) to capture the high-order interactions between attributes and their complex nonlinear transformations. We demonstrate the effectiveness of NN-MCDA with extensive simulation studies and two real-world datasets. To the best of our knowledge, this research is the first to enhance the interpretability of machine learning models with MCDA techniques. The proposed framework also sheds light on how to use machine learning techniques to free MCDA from strong assumptions.
Generalized Linear Rule Models
Wei, Dennis, Dash, Sanjeeb, Gao, Tian, Gรผnlรผk, Oktay
This paper considers generalized linear models using rule-based features, also referred to as rule ensembles, for regression and probabilistic classification. Rules facilitate model interpretation while also capturing nonlinear dependences and interactions. Our problem formulation accordingly trades off rule set complexity and prediction accuracy. Column generation is used to optimize over an exponentially large space of rules without pre-generating a large subset of candidates or greedily boosting rules one by one. The column generation subproblem is solved using either integer programming or a heuristic optimizing the same objective. In experiments involving logistic and linear regression, the proposed methods obtain better accuracy-complexity trade-offs than existing rule ensemble algorithms. At one end of the trade-off, the methods are competitive with less interpretable benchmark models.
Covariate-Powered Empirical Bayes Estimation
Ignatiadis, Nikolaos, Wager, Stefan
We study methods for simultaneous analysis of many noisy experiments in the presence of rich covariate information. The goal of the analyst is to optimally estimate the true effect underlying each experiment. Both the noisy experimental results and the auxiliary covariates are useful for this purpose, but neither data source on its own captures all the information available to the analyst. In this paper, we propose a flexible plug-in empirical Bayes estimator that synthesizes both sources of information and may leverage any black-box predictive model. We show that our approach is within a constant factor of minimax for a simple data-generating model. Furthermore, we establish robust convergence guarantees for our method that hold under considerable generality, and exhibit promising empirical performance on both real and simulated data.
Robust Gaussian Process Regression for Real-Time High Precision GPS Signal Enhancement
Lin, Ming, Song, Xiaomin, Qian, Qi, Li, Hao, Sun, Liang, Zhu, Shenghuo, Jin, Rong
Satellite-based positioning system such as GPS often suffers from large amount of noise that degrades the positioning accuracy dramatically especially in real-time applications. In this work, we consider a data-mining approach to enhance the GPS signal. We build a large-scale high precision GPS receiver grid system to collect real-time GPS signals for training. The Gaussian Process (GP) regression is chosen to model the vertical Total Electron Content (vTEC) distribution of the ionosphere of the Earth. Our experiments show that the noise in the real-time GPS signals often exceeds the breakdown point of the conventional robust regression methods resulting in sub-optimal system performance. We propose a three-step approach to address this challenge. In the first step we perform a set of signal validity tests to separate the signals into clean and dirty groups. In the second step, we train an initial model on the clean signals and then reweigting the dirty signals based on the residual error. A final model is retrained on both the clean signals and the reweighted dirty signals. In the theoretical analysis, we prove that the proposed three-step approach is able to tolerate much higher noise level than the vanilla robust regression methods if two reweighting rules are followed. We validate the superiority of the proposed method in our real-time high precision positioning system against several popular state-of-the-art robust regression methods. Our method achieves centimeter positioning accuracy in the benchmark region with probability $78.4\%$ , outperforming the second best baseline method by a margin of $8.3\%$. The benchmark takes 6 hours on 20,000 CPU cores or 14 years on a single CPU.
Learning to Clear the Market
Shen, Weiran, Lahaie, Sรฉbastien, Leme, Renato Paes
The problem of market clearing is to set a price for an item such that quantity demanded equals quantity supplied. In this work, we cast the problem of predicting clearing prices into a learning framework and use the resulting models to perform revenue optimization in auctions and markets with contextual information. The economic intuition behind market clearing allows us to obtain fine-grained control over the aggressiveness of the resulting pricing policy, grounded in theory. To evaluate our approach, we fit a model of clearing prices over a massive dataset of bids in display ad auctions from a major ad exchange. The learned prices outperform other modeling techniques in the literature in terms of revenue and efficiency trade-offs. Because of the convex nature of the clearing loss function, the convergence rate of our method is as fast as linear regression.
How many variables should be entered in a principal component regression equation?
We study least squares linear regression over $N$ uncorrelated Gaussian features that are selected in order of decreasing variance. When the number of selected features $p$ is at most the sample size $n$, the estimator under consideration coincides with the principal component regression estimator; when $p>n$, the estimator is the least $\ell_2$ norm solution over the selected features. We give an average-case analysis of the out-of-sample prediction error as $p,n,N \to \infty$ with $p/N \to \alpha$ and $n/N \to \beta$, for some constants $\alpha \in [0,1]$ and $\beta \in (0,1)$. In this average-case setting, the prediction error exhibits a `double descent' shape as a function of $p$.
C2P2: A Collective Cryptocurrency Up/Down Price Prediction Engine
Bai, Chongyang, White, Tommy, Xiao, Linda, Subrahmanian, V. S., Zhou, Ziheng
We study the problem of predicting whether the price of the 21 most popular cryptocurrencies (according to coinmarketcap.com) will go up or down on day d, using data up to day d-1. Our C2P2 algorithm is the first algorithm to consider the fact that the price of a cryptocurrency c might depend not only on historical prices, sentiments, global stock indices, but also on the prices and predicted prices of other cryptocurrencies. C2P2 therefore does not predict cryptocurrency prices one coin at a time --- rather it uses similarity metrics in conjunction with collective classification to compare multiple cryptocurrency features to jointly predict the cryptocurrency prices for all 21 coins considered. We show that our C2P2 algorithm beats out a recent competing 2017 paper by margins varying from 5.1-83% and another Bitcoin-specific prediction paper from 2018 by 16%. In both cases, C2P2 is the winner on all cryptocurrencies considered. Moreover, we experimentally show that the use of similarity metrics within our C2P2 algorithm leads to a direct improvement for 20 out of 21 cryptocurrencies ranging from 0.4% to 17.8%. Without the similarity component, C2P2 still beats competitors on 20 out of 21 cryptocurrencies considered. We show that all these results are statistically significant via a Student's t-test with p<1e-5. Check our demo at https://www.cs.dartmouth.edu/dsail/demos/c2p2
Uncoupled Regression from Pairwise Comparison Data
Xu, Liyuan, Honda, Junya, Niu, Gang, Sugiyama, Masashi
Uncoupled regression is the problem to learn a model from unlabeled data and the set of target values while the correspondence between them is unknown. Such a situation arises in predicting anonymized targets that involve sensitive information, e.g., one's annual income. Since existing methods for uncoupled regression often require strong assumptions on the true target function, and thus, their range of applications is limited, we introduce a novel framework that does not require such assumptions in this paper. Our key idea is to utilize pairwise comparison data, which consists of pairs of unlabeled data that we know which one has a larger target value. Such pairwise comparison data is easy to collect, as typically discussed in the learning-to-rank scenario, and does not break the anonymity of data. We propose two practical methods for uncoupled regression from pairwise comparison data and show that the learned regression model converges to the optimal model with the optimal parametric convergence rate when the target variable distributes uniformly. Moreover, we empirically show that for linear models the proposed methods are comparable to ordinary supervised regression with labeled data.
The Importance of Analyzing Model Assumptions in Machine Learning
With this summary, we can see important values such as R2, the F-statistic, and many others. You can also analyze a model using a graphical diagnostic such as plotting the residuals against the fitted/predicted values. Above is the fitted versus residual plot for our weight-height dataset, using height as the predictor. For the most part, this plot is random. However, as fitted values increase, so does the range of residuals.