Regression
Neural networks for post-processing ensemble weather forecasts
Rasp, Stephan, Lerch, Sebastian
Ensemble weather predictions require statistical post-processing of systematic errors to obtain reliable and accurate probabilistic forecasts. Traditionally, this is accomplished with distributional regression models in which the parameters of a predictive distribution are estimated from a training period. We propose a flexible alternative based on neural networks that can incorporate nonlinear relationships between arbitrary predictor variables and forecast distribution parameters that are automatically learned in a data-driven way rather than requiring pre-specified link functions. In a case study of 2-meter temperature forecasts at surface stations in Germany, the neural network approach significantly outperforms benchmark post-processing methods while being computationally more affordable. Key components to this improvement are the use of auxiliary predictor variables and station-specific information with the help of embeddings. Furthermore, the trained neural network can be used to gain insight into the importance of meteorological variables thereby challenging the notion of neural networks as uninterpretable black boxes. Our approach can easily be extended to other statistical post-processing and forecasting problems. We anticipate that recent advances in deep learning combined with the ever-increasing amounts of model and observation data will transform the post-processing of numerical weather forecasts in the coming decade.
Do Better ImageNet Models Transfer Better?
Kornblith, Simon, Shlens, Jonathon, Le, Quoc V.
Transfer learning has become a cornerstone of computer vision with the advent of ImageNet features, yet little work has been done to evaluate the performance of ImageNet architectures across different datasets. An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks. However, this hypothesis has never been systematically tested. Here, we compare the performance of 13 classification models on 12 image classification tasks in three settings: as fixed feature extractors, fine-tuned, and trained from random initialization. We find that, when networks are used as fixed feature extractors, ImageNet accuracy is only weakly predictive of accuracy on other tasks ($r^2=0.24$). In this setting, ResNets consistently outperform networks that achieve higher accuracy on ImageNet. When networks are fine-tuned, we observe a substantially stronger correlation ($r^2 = 0.86$). We achieve state-of-the-art performance on eight image classification tasks simply by fine-tuning state-of-the-art ImageNet architectures, outperforming previous results based on specialized methods for transfer learning. Finally, we observe that, on three small fine-grained image classification datasets, networks trained from random initialization perform similarly to ImageNet-pretrained networks. Together, our results show that ImageNet architectures generalize well across datasets, with small improvements in ImageNet accuracy producing improvements across other tasks, but ImageNet features are less general than previously suggested.
[N] Snap ML - An IBM framework for all machine learning, except deep learning โข r/MachineLearning
I do think that beating TensorFlow on tasks like logistic regression is not particularly hard. A student asked me once to help optimize his Tf code for a large scale linear regression model on multiple GPUs. It was magnitudes slower than the single-core scikit-learn implementation. We spent hours trying to get the best performance out of it, including various experiments with the data loading directly to the GPU tensors bypassing the Python runtime. TensorFlow is just not optimized for this kind of stuff because of various overheads, I assume. People underestimate how fast scikit-learn is for generalized linear models thanks to BLAS and LIBLINEAR.
A Beginner's Guide to Machine Learning (in Python)
In this course, you will learn the basics of Machine Learning and Data Mining; almost everything you need to get started. You will understand what Big Data is and what Data Science and Data Analytics is. You will learn algorithms such as Linear Regression, Logistic Regression, Support Vector Machine, K-Nearest Neighbor, Decision Trees, and Neural Networks. You'll also understand how to combine algorithms into ensembles. Preprocessing data will be taught and you will understand how to clean your data, transform it, how to handle categorical features, and how to handle unbalanced data.
Predictive Modelling in R Online Training R Certification Course Edureka
This course will introduce you to some of the most widely used predictive modeling techniques and their core principles. Models such as multiple linear regression, logistic regression, auto-regressive integrated moving average (ARIMA), decision trees, and neural networks are frequently used in solving predictive analytics problems.
Super learning in the SAS system
Background and objective: Stacking is an ensemble machine learning method that averages predictions from multiple other algorithms, such as generalized linear models and regression trees. A recent iteration of stacking, called super learning, has been developed as a general approach to black box supervised learning and has seen frequent usage, in part due to the availability of an R package. I develop super learning in the SAS software system using a new macro, and demonstrate its performance relative to the R package. Methods: I follow closely previous work using the R SuperLearner package and assess the performance of super learning in a number of domains. I compare the R package with the new SAS macro in a small set of simulations assessing curve fitting in a prediction model, a set of 14 publicly available datasets to assess cross-validated, expected loss, and data from a randomized trial of job seekers' training to assess the utility of super learning in causal inference using inverse probability weighting. Results: Across the simulated data and the publicly available data, the macro performed similarly to the R package, even with a different set of potential algorithms available natively in R and SAS. The example with inverse probability weighting demonstrated the ability of the SAS macro to include algorithms developed in R. Conclusions: The super learner macro performs as well as the R package at a number of tasks. Further, by extending the macro to include the use of R packages, the macro can leverage both the robust, enterprise oriented procedures in SAS and the nimble, cutting edge packages in R. In the spirit of ensemble learning, this macro extends the potential library of algorithms beyond a single software system and provides a simple avenue into machine learning in SAS.
Approximate Newton-based statistical inference using only stochastic gradients
Li, Tianyang, Kyrillidis, Anastasios, Liu, Liu, Caramanis, Constantine
We present a novel inference framework for convex empirical risk minimization, using approximate stochastic Newton steps. The proposed algorithm is based on the notion of finite differences and allows the approximation of a Hessian-vector product from first-order information. In theory, our method efficiently computes the statistical error covariance in $M$-estimation, both for unregularized convex learning problems and high-dimensional LASSO regression, without using exact second order information, or resampling the entire data set. In practice, we demonstrate the effectiveness of our framework on large-scale machine learning problems, that go even beyond convexity: as a highlight, our work can be used to detect certain adversarial attacks on neural networks.
Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning
Hyvarinen, Aapo, Sasaki, Hiroaki, Turner, Richard E.
CBL U Cambridge, UK Nonlinear ICA is a fundamental problem for unsupervised representation learning, emphasizing the capacity to recover the underlying latent variables generating the data (i.e., identifiability). Recently, the very first identifiability proofs for nonlinear ICA have been proposed, leveraging the temporal structure of the independent components. Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure. It is based on augmenting the data by an auxiliary variable, such as the time index, the history of the time series, or any other available information. We propose to learn nonlinear ICA by discriminating between true augmented data, or data in which the auxiliary variable has been randomized. This enables the framework to be implemented algorithmically through logistic regression, possibly in a neural network. We provide a comprehensive proof of the identifiability of the model as well as the consistency of our estimation method. The approach not only provides a general theoretical framework combining and generalizing previously proposed nonlinear ICA models and algorithms, but also brings practical advantages.
On Coresets for Logistic Regression
Munteanu, Alexander, Schwiegelshohn, Chris, Sohler, Christian, Woodruff, David P.
Coresets are one of the central methods to facilitate the analysis of large data sets. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show a negative result, namely, that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure $\mu(X)$, which quantifies the hardness of compressing a data set for logistic regression. $\mu(X)$ has an intuitive statistical interpretation that may be of independent interest. For data sets with bounded $\mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1\pm\varepsilon)$-coreset. We illustrate the performance of our method by comparing to uniform sampling as well as to state of the art methods in the area. The experiments are conducted on real world benchmark data for logistic regression.
Adversarial Labeling for Learning without Labels
Arachie, Chidubem, Huang, Bert
We consider the task of training classifiers without labels. We propose a weakly supervised method---adversarial label learning---that trains classifiers to perform well against an adversary that chooses labels for training data. The weak supervision constrains what labels the adversary can choose. The method therefore minimizes an upper bound of the classifier's error rate using projected primal-dual subgradient descent. Minimizing this bound protects against bias and dependencies in the weak supervision. Experiments on three real datasets show that our method can train without labels and outperforms other approaches for weakly supervised learning.