Regression
Adaptive Bayesian Linear Regression for Automated Machine Learning
Zhou, Weilin, Precioso, Frederic
To solve a machine learning problem, one typically needs to perform data preprocessing, modeling, and hyperparameter tuning, which is known as model selection and hyperparameter optimization.The goal of automated machine learning (AutoML) is to design methods that can automatically perform model selection and hyperparameter optimization without human interventions for a given dataset. In this paper, we propose a meta-learning method that can search for a high-performance machine learning pipeline from the predefined set of candidate pipelines for supervised classification datasets in an efficient way by leveraging meta-data collected from previous experiments. More specifically, our method combines an adaptive Bayesian regression model with a neural network basis function and the acquisition function from Bayesian optimization. The adaptive Bayesian regression model is able to capture knowledge from previous meta-data and thus make predictions of the performances of machine learning pipelines on a new dataset. The acquisition function is then used to guide the search of possible pipelines based on the predictions.The experiments demonstrate that our approach can quickly identify high-performance pipelines for a range of test datasets and outperforms the baseline methods.
10 Algorithms Every Machine Learning Enthusiast Should Know
It is very crucial for the machine learning enthusiasts to know and understands the basic and important machine learning algorithms in order to keep themselves up with the current trend. In this article, we list down 10 basic algorithms which play very important roles in the machine learning era. Logistic regression, also known as the logit classifier is a popular mathematical modelling procedure used in the analysis of data. Regression Analysis is used to conduct when the dependent variable is binary i.e. 0 and 1. In Logistic Regression, logistic function is used to describe the mathematical form on which the logistic model is based.
Discriminative Regression Machine: A Classifier for High-Dimensional Data or Imbalanced Data
We introduce a discriminative regression approach to supervised classification in this paper. It estimates a representation model while accounting for discriminativeness between classes, thereby enabling accurate derivation of categorical information. This new type of regression models extends existing models such as ridge, lasso, and group lasso through explicitly incorporating discriminative information. As a special case we focus on a quadratic model that admits a closed-form analytical solution. The corresponding classifier is called discriminative regression machine (DRM). Three iterative algorithms are further established for the DRM to enhance the efficiency and scalability for real applications. Our approach and the algorithms are applicable to general types of data including images, high-dimensional data, and imbalanced data. We compare the DRM with currently state-of-the-art classifiers. Our extensive experimental results show superior performance of the DRM and confirm the effectiveness of the proposed approach.
Scalable and Efficient Hypothesis Testing with Random Forests
Coleman, Tim, Peng, Wei, Mentch, Lucas
Throughout the last decade, random forests have established themselves as among the most accurate and popular supervised learning methods. While their black-box nature has made their mathematical analysis difficult, recent work has established important statistical properties like consistency and asymptotic normality by considering subsampling in lieu of bootstrapping. Though such results open the door to traditional inference procedures, all formal methods suggested thus far place severe restrictions on the testing framework and their computational overhead precludes their practical scientific use. Here we propose a permutation-style testing approach to formally assess feature significance. We establish asymptotic validity of the test via exchangeability arguments and show that the test maintains high power with orders of magnitude fewer computations. As importantly, the procedure scales easily to big data settings where large training and testing sets may be employed without the need to construct additional models. Simulations and applications to ecological data where random forests have recently shown promise are provided.
A Generative Map for Image-based Camera Localization
Guo, Mingpan, Matthes, Stefan, Ye, Jiaojiao, Shen, Hao
In image-based camera localization systems, information about the environment is usually stored in some representation, which can be referred to as a map. Conventionally, most maps are built upon hand-crafted features. Recently, neural networks have attracted attention as a data-driven map representation, and have shown promising results in visual localization. However, these neural network maps are generally hard to interpret by human. A readable map is not only accessible to humans, but also provides a way to be verified when the ground truth pose is unavailable. To tackle this problem, we propose Generative Map, a new framework for learning human-readable neural network maps, by combining a generative model with the Kalman filter, which also allows it to incorporate additional sensor information such as stereo visual odometry. For evaluation, we use real world images from the 7-Scenes and Oxford RobotCar datasets. We demonstrate that our Generative Map can be queried with a pose of interest from the test sequence to predict an image, which closely resembles the true scene. For localization, we show that Generative Map achieves comparable performance with current regression models. Moreover, our framework is trained completely from scratch, unlike regression models which rely on large ImageNet pretrained networks.
How Widely Can Prediction Models be Generalized? An Analysis of Performance Prediction in Blended Courses
Gitinabard, Niki, Xu, Yiqiao, Heckman, Sarah, Barnes, Tiffany, Lynch, Collin F.
Blended courses that mix in-person instruction with online platforms are increasingly popular in secondary education. These tools record a rich amount of data on students' study habits and social interactions. Prior research has shown that these metrics are correlated with students' performance in face to face classes. However, predictive models for blended courses are still limited and have not yet succeeded at early prediction or cross-class predictions even for repeated offerings of the same course. In this work, we use data from two offerings of two different undergraduate courses to train and evaluate predictive models on student performance based upon persistent student characteristics including study habits and social interactions. We analyze the performance of these models on the same offering, on different offerings of the same course, and across courses to see how well they generalize. We also evaluate the models on different segments of the courses to determine how early reliable predictions can be made. This work tells us in part how much data is required to make robust predictions and how cross-class data may be used, or not, to boost model performance. The results of this study will help us better understand how similar the study habits, social activities, and the teamwork styles are across semesters for students in each performance category. These trained models also provide an avenue to improve our existing support platforms to better support struggling students early in the semester with the goal of providing timely intervention.
Comparison of statistical post-processing methods for probabilistic NWP forecasts of solar radiation
Bakker, Kilian, Whan, Kirien, Knap, Wouter, Schmeits, Maurice
The increased usage of solar energy places additional importance on forecasts of solar radiation. Solar panel power production is primarily driven by the amount of solar radiation and it is therefore important to have accurate forecasts of solar radiation. Accurate forecasts that also give information on the forecast uncertainties can help users of solar energy to make better solar radiation based decisions related to the stability of the electrical grid. To achieve this, we apply statistical post-processing techniques that determine relationships between observations of global radiation (made within the KNMI network of automatic weather stations in the Netherlands) and forecasts of various meteorological variables from the numerical weather prediction (NWP) model HARMONIE-AROME (HA) and the atmospheric composition model CAMS. Those relationships are used to produce probabilistic forecasts of global radiation. We compare 7 different statistical post-processing methods, consisting of two parametric and five non-parametric methods. We find that all methods are able to generate probabilistic forecasts that improve the raw global radiation forecast from HA according to the root mean squared error (on the median) and the potential economic value. Additionally, we show how important the predictors are in the different regression methods. We also compare the regression methods using various probabilistic scoring metrics, namely the continuous ranked probability skill score, the Brier skill score and reliability diagrams. We find that quantile regression and generalized random forests generally perform best. In (near) clear sky conditions the non-parametric methods have more skill than the parametric ones.
Copula-like Variational Inference
Hirt, Marcel, Dellaportas, Petros, Durmus, Alain
This paper considers a new family of variational distributions motivated by Sklar's theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently, i.e. with a complexity linear in the dimension of state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity $\mathcal{O}(d \log d)$. We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.
P\'olygamma Data Augmentation to address Non-conjugacy in the Bayesian Estimation of Mixed Multinomial Logit Models
Bansal, Prateek, Krueger, Rico, Bierlaire, Michel, Daziano, Ricardo A., Rashidi, Taha H.
The standard Gibbs sampler of Mixed Multinomial Logit (MMNL) models involves sampling from conditional densities of utility parameters using Metropolis-Hastings (MH) algorithm due to unavailability of conjugate prior for logit kernel. To address this non-conjugacy concern, we propose the application of P\'olygamma data augmentation (PG-DA) technique for the MMNL estimation. The posterior estimates of the augmented and the default Gibbs sampler are similar for two-alternative scenario (binary choice), but we encounter empirical identification issues in the case of more alternatives ($J \geq 3$).
XGBoost Algorithm: Long May She Reign!
I still remember the day 1 of my very first job fifteen years ago. I had just finished my graduate studies and joined a global investment bank as an analyst. On my first day, I kept straightening my tie and trying to remember everything that I had studied. Meanwhile, deep down, I wondered if I was good enough for the corporate world. The only thing that you need to know is the regression modeling!"