Regression
5 Steps to Maximize Business Impact with Machine Learning
Regardless of the metric you decide to optimize, it's necessary to establish a baseline measure of performance. This baseline provides a point of comparison that enables you to track your progress. It also allows you to judge the rate of return you'll get by increasing the complexity of your modeling solution. Suppose you work for a real estate firm and are asked to build a model to predict the price of a house. You decide to optimize for RMSE and build a linear regression model with features including the square footage of the house, the number of bedrooms and bathrooms, and other information.
KNG: The K-Norm Gradient Mechanism
Reimherr, Matthew, Awan, Jordan
This paper presents a new mechanism for producing sanitized statistical summaries that achieve \emph{differential privacy}, called the \emph{K-Norm Gradient} Mechanism, or KNG. This new approach maintains the strong flexibility of the exponential mechanism, while achieving the powerful utility performance of objective perturbation. KNG starts with an inherent objective function (often an empirical risk), and promotes summaries that are close to minimizing the objective by weighting according to how far the gradient of the objective function is from zero. Working with the gradient instead of the original objective function allows for additional flexibility as one can penalize using different norms. We show that, unlike the exponential mechanism, the noise added by KNG is asymptotically negligible compared to the statistical error for many problems. In addition to theoretical guarantees on privacy and utility, we confirm the utility of KNG empirically in the settings of linear and quantile regression through simulations.
Multi-Task Kernel Null-Space for One-Class Classification
Arashloo, Shervin Rahimzadeh, Kittler, Josef
The one-class kernel spectral regression (OC-KSR), the regression-based formulation of the kernel null-space approach has been found to be an effective Fisher criterion-based methodology for one-class classification (OCC), achieving state-of-the-art performance in one-class classification while providing relatively high robustness against data corruption. This work extends the OC-KSR methodology to a multi-task setting where multiple one-class problems share information for improved performance. By viewing the multi-task structure learning problem as one of compositional function learning, first, the OC-KSR method is extended to learn multiple tasks' structure \textit{linearly} by posing it as an instantiation of the separable kernel learning problem in a vector-valued reproducing kernel Hilbert space where an output kernel encodes tasks' structure while another kernel captures input similarities. Next, a non-linear structure learning mechanism is proposed which captures multiple tasks' relationships \textit{non-linearly} via an output kernel. The non-linear structure learning method is then extended to a sparse setting where different tasks compete in an output composition mechanism, leading to a sparse non-linear structure among multiple problems. Through extensive experiments on different data sets, the merits of the proposed multi-task kernel null-space techniques are verified against the baseline as well as other existing multi-task one-class learning techniques.
Logistic Regression with Python
Logistic regression was once the most popular machine learning algorithm, but the advent of more accurate algorithms for classification such as support vector machines, random forest, and neural networks has induced some machine learning engineers to view logistic regression as obsolete. Though it may have been overshadowed by more advanced methods, its simplicity makes it the ideal algorithm to use as an introduction to the study of machine learning. Like most machine learning algorithms, logistic regression creates a boundary edge between binary labels. The purpose of a training process is to place this edge in such a way that most of the labels are divided so as to maximize the accuracy of predictions. The training process requires correct model architecture and fine-tuned hyperparameters, whereas data play the most significant role in determining the prediction accuracy.
Multi-Class classification with Sci-kit learn & XGBoost: A case study using Brainwave data
In Machine learning, classification problems with high-dimensional data are really challenging. Sometimes, very simple problems become extremely complex due this'curse of dimensionality' problem. In this article, we will see how accuracy and performance vary across different classifiers. We will also see how, when we don't have the freedom to choose a classifier independently, we can do feature engineering to make a poor classifier perform well. For this article, we will use the "EEG Brainwave Dataset" from Kaggle.
Logistic regression as a neural network
As a teacher of Data Science (Data Science for Internet of Things course at the University of Oxford), I am always fascinated in cross connection between concepts. To recap, Logistic regression is a binary classification method. It can be modelled as a function that can take in any number of inputs and constrain the output to be between 0 and 1. This means, we can think of Logistic Regression as a one-layer neural network. For a binary output, if the true label is y (y 0 or y 1) and y_hat is the predicted output โ then y_hat represents the probability that y 1 - given inputs w and x. Therefore, the probability that y 0 given inputs w and x is (1 - y_hat), as shown below.
Demand forecasting techniques for build-to-order lean manufacturing supply chains
Rivera-Castro, Rodrigo, Nazarov, Ivan, Xiang, Yuke, Pletneev, Alexander, Maksimov, Ivan, Burnaev, Evgeny
Build-to-order (BTO) supply chains have become common-place in industries such as electronics, automotive and fashion. They enable building products based on individual requirements with a short lead time and minimum inventory and production costs. Due to their nature, they differ significantly from traditional supply chains. However, there have not been studies dedicated to demand forecasting methods for this type of setting. This work makes two contributions. First, it presents a new and unique data set from a manufacturer in the BTO sector. Second, it proposes a novel data transformation technique for demand forecasting of BTO products. Results from thirteen forecasting methods show that the approach compares well to the state-of-the-art while being easy to implement and to explain to decision-makers.
Learning Ensembles of Anomaly Detectors on Synthetic Data
Smolyakov, D., Sviridenko, N., Ishimtsev, V., Burikov, E., Burnaev, E.
The main aim of this work is to develop and implement an automatic anomaly detection algorithm for meteorological time-series. To achieve this goal we develop an approach to constructing an ensemble of anomaly detectors in combination with adaptive threshold selection based on artificially generated anomalies. We demonstrate the efficiency of the proposed method by integrating the corresponding implementation into ``Minimax-94'' road weather information system.
Conditionally-additive-noise Models for Structure Learning
Chicharro, Daniel, Panzeri, Stefano, Shpitser, Ilya
Methods based on additive-noise (AN) models have been proposed to further discriminate between causal structures that are equivalent in terms of conditional independencies. These methods rely on a particular form of the generative functional equations, with an additive noise structure, which allows inferring the directionality of causation by testing the independence between the residuals of a nonlinear regression and the predictors (nrr-independencies). Full causal structure identifiability has been proven for systems that contain only additive-noise equations and have no hidden variables. We extend the AN framework in several ways. We introduce alternative regression-free tests of independence based on conditional variances (cv-independencies). We consider conditionally-additive-noise (CAN) models, in which the equations may have the AN form only after conditioning. We exploit asymmetries in nrr-independencies or cv-independencies resulting from the CAN form to derive a criterion that infers the causal relation between a pair of variables in a multivariate system without any assumption about the form of the equations or the presence of hidden variables.
Conformal Prediction Interval Estimations with an Application to Day-Ahead and Intraday Power Markets
Kath, Christopher, Ziel, Florian
We discuss a concept denoted as Conformal Prediction (CP) in this paper. While initially stemming from the world of machine learning, it was never applied or analyzed in the context of short-term electricity price forecasting. Therefore, we elaborate the aspects that render Conformal Prediction worthwhile to know and explain why its simple yet very efficient idea has worked in other fields of application and why its characteristics are promising for short-term power applications as well. We compare its performance with different state-of-the-art electricity price forecasting models such as quantile regression averaging (QRA) in an empirical out-of-sample study for three short-term electricity time series. We combine Conformal Prediction with various underlying point forecast models to demonstrate its versatility and behavior under changing conditions. Our findings suggest that Conformal Prediction yields sharp and reliable prediction intervals in short-term power markets. We further inspect the effect each of Conformal Prediction's model components has and provide a path-based guideline on how to find the best CP model for each market.