Regression
Spline-Based Probability Calibration
In many classification problems it is desirable to output well-calibrated probabilities on the different classes. We propose a robust, non-parametric method of calibrating probabilities called SplineCalib that utilizes smoothing splines to determine a calibration function. We demonstrate how applying certain transformations as part of the calibration process can improve performance on problems in deep learning and other domains where the scores tend to be "overconfident". We adapt the approach to multi-class problems and find that better calibration can improve accuracy as well as log-loss by better resolving uncertain cases. Finally, we present a cross-validated approach to calibration which conserves data. Significant improvements to log-loss and accuracy are shown on several different problems. We also introduce the ml-insights python package which contains an implementation of the SplineCalib algorithm.
Using Eigencentrality to Estimate Joint, Conditional and Marginal Probabilities from Mixed-Variable Data: Method and Applications
Abstract--The ability to estimate joint, conditional and marginal probability distributions over some set of variables is of great utility for many common machine learning tasks. However, estimating these distributions can be challenging, particularly in the case of data containing a mix of discrete and continuous variables. This paper presents a nonparametric method for estimating these distributions directly from a dataset. The data are first represented as a graph consisting of object nodes and attribute value nodes. Depending on the distribution to be estimated, an appropriate eigenvector equation is then constructed. This equation is then solved to find the corresponding stationary distribution of the graph, from which the required distributions can then be estimated and sampled from. The paper demonstrates how the method can be applied to many common machine learning tasks including classification, regression, missing value imputation, outlier detection, random vector generation, and clustering. Being able to estimate joint, conditional and marginal probabilities from some dataset allows a broad range of useful tasks to be performed. For example, classification and regression involve predicting the value of some target variable conditional on the values of the other variables. If we can sample values from the estimated distributions, we could perform random vector generation by generating full random vectors that display the same correlations as the vectors (i.e., data points) in the original data [4], [5]. If we can estimate the joint distribution for the full dataset, then we should also be able to do this for subsets of data, leading to the use of Expectation-Maximization [6] to cluster the data [7]. Taken together, these activities form a large chunk of the tasks commonly used in machine learning. All of this depends, of course, on being able to estimate the various probabilities, and this is particularly challenging on datasets containing a complex mix of continuous and discrete variables.
Improving Subseasonal Forecasting in the Western U.S. with Machine Learning
Hwang, Jessica, Orenstein, Paulo, Pfeiffer, Karl, Cohen, Judah, Mackey, Lester
Water managers in the western United States (U.S.) rely on longterm forecasts of temperature and precipitation to prepare for droughts and other wet weather extremes. To improve the accuracy of these longterm forecasts, the Bureau of Reclamation and the National Oceanic and Atmospheric Administration (NOAA) launched the Subseasonal Climate Forecast Rodeo, a year-long real-time forecasting challenge, in which participants aimed to skillfully predict temperature and precipitation in the western U.S. two to four weeks and four to six weeks in advance. Here we present and evaluate our machine learning approach to the Rodeo and release our SubseasonalRodeo dataset, collected to train and evaluate our forecasting system. Our system is an ensemble of two regression models. The first integrates the diverse collection of meteorological measurements and dynamic model forecasts in the SubseasonalRodeo dataset and prunes irrelevant predictors using a customized multitask model selection procedure. The second uses only historical measurements of the target variable (temperature or precipitation) and introduces multitask nearest neighbor features into a weighted local linear regression. Each model alone is significantly more accurate than the operational U.S. Climate Forecasting System (CFSv2), and our ensemble skill exceeds that of the top Rodeo competitor for each target variable and forecast horizon. We hope that both our dataset and our methods will serve as valuable benchmarking tools for the subseasonal forecasting problem.
Noise Statistics Oblivious GARD For Robust Regression With Sparse Outliers
Kallummil, Sreejith, Kalyani, Sheetal
Linear regression models contaminated by Gaussian noise (inlier) and possibly unbounded sparse outliers are common in many signal processing applications. Sparse recovery inspired robust regression (SRIRR) techniques are shown to deliver high quality estimation performance in such regression models. Unfortunately, most SRIRR techniques assume \textit{a priori} knowledge of noise statistics like inlier noise variance or outlier statistics like number of outliers. Both inlier and outlier noise statistics are rarely known \textit{a priori} and this limits the efficient operation of many SRIRR algorithms. This article proposes a novel noise statistics oblivious algorithm called residual ratio thresholding GARD (RRT-GARD) for robust regression in the presence of sparse outliers. RRT-GARD is developed by modifying the recently proposed noise statistics dependent greedy algorithm for robust de-noising (GARD). Both finite sample and asymptotic analytical results indicate that RRT-GARD performs nearly similar to GARD with \textit{a priori} knowledge of noise statistics. Numerical simulations in real and synthetic data sets also point to the highly competitive performance of RRT-GARD.
A convex formulation for high-dimensional sparse sliced inverse regression
Tan, Kean Ming, Wang, Zhaoran, Zhang, Tong, Liu, Han, Cook, R. Dennis
Sliced inverse regression is a popular tool for sufficient dimension reduction, which replaces covariates with a minimal set of their linear combinations without loss of information on the conditional distribution of the response given the covariates. The estimated linear combinations include all covariates, making results difficult to interpret and perhaps unnecessarily variable, particularly when the number of covariates is large. In this paper, we propose a convex formulation for fitting sparse sliced inverse regression in high dimensions. Our proposal estimates the subspace of the linear combinations of the covariates directly and performs variable selection simultaneously. We solve the resulting convex optimization problem via the linearized alternating direction methods of multiplier algorithm, and establish an upper bound on the subspace distance between the estimated and the true subspaces. Through numerical studies, we show that our proposal is able to identify the correct covariates in the high-dimensional setting.
Human-Machine Collaborative Optimization via Apprenticeship Scheduling
Gombolay, Matthew, Jensen, Reed, Stigile, Jessica, Golen, Toni, Shah, Neel, Son, Sung-Hyun, Shah, Julie
Coordinating agents to complete a set of tasks with intercoupled temporal and resource constraints is computationally challenging, yet human domain experts can solve these difficult scheduling problems using paradigms learned through years of apprenticeship. A process for manually codifying this domain knowledge within a computational framework is necessary to scale beyond the "single-expert, single-trainee" apprenticeship model. However, human domain experts often have difficulty describing their decision-making processes. We propose a new approach for capturing this decision-making process through counterfactual reasoning in pairwise comparisons. Our approach is model-free and does not require iterating through the state space. We demonstrate that this approach accurately learns multifaceted heuristics on a synthetic and real world data sets. We also demonstrate that policies learned from human scheduling demonstration via apprenticeship learning can substantially improve the efficiency of schedule optimization. We employ this human-machine collaborative optimization technique on a variant of the weapon-to-target assignment problem. We demonstrate that this technique generates optimal solutions up to 9.5 times faster than a state-of-the-art optimization algorithm.
My Running Code from Andrew Ng's Machine Learning Intro
So instead, I've put this video together showing my code running. Topics covered broadly follow the course contents: * Linear Regression * Logistic Regression * Regularization * Hand-writing Recognition * Neural Networks * Support Vector Machines * Unsupervised Learning * Anomaly Detection * Recommender Systems Thanks for watching!
Kapil Sharma
Least squares estimates are often not very satisfactory due to their poor out-of-sample performance, especially when the model is overly complex with a lot of features. We can attribute this to low bias and large variance in least squares estimates. Additionally, when we have a lot of features in our model, it is harder to explain the features with the strongest effect or what we call the Big Picture. Hence, we might want to choose fewer features in order to trade a worse in-sample variance for a better out-of-sample prediction. Regularization is a method to shrink or drop coefficients/parameters from a model by imposing a penalty on their size.
Short-term Cognitive Networks, Flexible Reasoning and Nonsynaptic Learning
Nápoles, Gonzalo, Vanhoenshoven, Frank, Vanhoof, Koen
While the machine learning literature dedicated to fully automated reasoning algorithms is abundant, the number of methods enabling the inference process on the basis of previously defined knowledge structures is scanter. Fuzzy Cognitive Maps (FCMs) are neural networks that can be exploited towards this goal because of their flexibility to handle external knowledge. However, FCMs suffer from a number of issues that range from the limited prediction horizon to the absence of theoretically sound learning algorithms able to produce accurate predictions. In this paper, we propose a neural network system named Short-term Cognitive Networks that tackle some of these limitations. In our model weights are not constricted and may have a causal nature or not. As a second contribution, we present a nonsynaptic learning algorithm to improve the network performance without modifying the previously defined weights. Moreover, we derive a stop condition to prevent the learning algorithm from iterating without decreasing the simulation error.
Essential Math for Data Science: 'Why' and 'How'
Mathematics is the bedrock of any contemporary discipline of science. It is no surprise then that, almost all the techniques of modern data science (including all of the machine learning) have some deep mathematical underpinning or the other. In this article, we discuss the essential math topics to master to become a better data scientist in all aspects. Mathematics is the bedrock of any contemporary discipline of science. It is no surprise then that, almost all the techniques of modern data science (including all of the machine learning) have some deep mathematical underpinning or the other. Sometimes, as a data scientist (or even as a junior analyst on the team), you have to learn those foundational mathematics by heart to use or apply the techniques properly, other times you can just get by using an API or the out-of-box algorithm.