Goto

Collaborating Authors

 Regression


A Note on Posterior Probability Estimation for Classifiers

arXiv.org Machine Learning

One of the central themes in the classification task is the estimation of class posterior probability at a new point $\bf{x}$. The vast majority of classifiers output a score for $\bf{x}$, which is monotonically related to the posterior probability via an unknown relationship. There are many attempts in the literature to estimate this latter relationship. Here, we provide a way to estimate the posterior probability without resorting to using classification scores. Instead, we vary the prior probabilities of classes in order to derive the ratio of pdf's at point $\bf{x}$, which is directly used to determine class posterior probabilities. We consider here the binary classification problem.


Machine Learning Basics

#artificialintelligence

Before we start this article on machine learning basics, let us take an example to understand the impact of machine learning in the world. We can safely assume that machine learning has been a dominant force in today's world and has accelerated our progress in all fields. No matter which industry you look at, machine learning has dramatically altered it. Let's take an example from the world of trading. Man Group's AHL Dimension programme is a $5.1 billion dollar hedge fund which is partially managed by AI. After it started off, by the year 2015, its machine learning algorithms were contributing more than half of the profits of the fund even though the assets under its management were far less. Machine learning has become a hot topic today, with professionals all over the world signing up for ML or AI courses for fear of being left behind. But exactly what is machine learning? It will be clear to you when you have reached the end of this article. Machine Learning, as the name suggests, provides machines with the ability to learn autonomously based on experiences, observations and analysing patterns within a given data set without explicitly programming. When we write a program or a code for some specific purpose, we are actually writing a definite set of instructions which the machine will follow. Whereas in machine learning, we input a data set through which the machine will learn by identifying and analysing the patterns in the data set and learn to take decisions autonomously based on its observations and learnings from the dataset.


Aggregated Hold-Out

arXiv.org Machine Learning

Aggregated hold-out (Agghoo) is a method which averages learning rules selected by hold-out (that is, cross-validation with a single split). We provide the first theoretical guarantees on Agghoo, ensuring that it can be used safely: Agghoo performs at worst like the hold-out when the risk is convex. The same holds true in classification with the 0-1 risk, with an additional constant factor. For the hold-out, oracle inequalities are known for bounded losses, as in binary classification. We show that similar results can be proved, under appropriate assumptions, for other risk-minimization problems. In particular, we obtain an oracle inequality for regularized kernel regression with a Lip-schitz loss, without requiring that the Y variable or the regressors be bounded. Numerical experiments show that aggregation brings a significant improvement over the hold-out and that Agghoo is competitive with cross-validation.


A comparison of some conformal quantile regression methods

arXiv.org Machine Learning

Matteo Sesia 1 and Emmanuel J. Cand es 1,2 1 Department of Statistics, Stanford University 2 Department of Mathematics, Stanford University September 13, 2019 Abstract We compare two recently proposed methods that combine ideas from conformal inference and quantile regression to produce locally adaptive and marginally valid prediction intervals under sample exchangeability (Romano et al., 2019 [1]; Kivaranovic et al., 2019 [2]). First, we prove that these two approaches are asymptotically efficient in large samples, under some additional assumptions. Then we compare them empirically on simulated and real data. Our results demonstrate that the method in Romano et al. (2019) typically yields tighter prediction intervals in finite samples. Finally, we discuss how to tune these procedures by fixing the relative proportions of observations used for training and conformalization. 1 Introduction 1.1 Background and motivation Given a set of n points { (X i,Y i) } n i 1, with Y i R and X i R d, we consider the problem of constructing a prediction interval for a new point Y n 1based on the observed value of X n 1, assuming only that { (X i,Y i) } n 1 i 1 are drawn exchangeably from some common distribution P XY. There exist a vast selection of statistical and machine learning algorithms that can provide approximate answers to this question [3, 4].


Regularization in Machine Learning

#artificialintelligence

Hello Guys, This blog contains all you need to know about regularization. This blog is all about mathematical intuition behind regularization and its Implementation in python.This blog is intended specially for newbies who are finding regularization difficult to digest. For any machine learning enthusiast, understanding the mathematical intuition and background working is more important then just implementing the model. I am new to world of blogging so If anyone encounters any problem whether conceptual or language-related please comment below. Back in the days, when I came across regularization it became difficult for me to to get mathematical intuition behind it.


#006A Fast Logistic Regression Master Data Science

#artificialintelligence

When we are programming Logistic Regression or Neural Networks we should avoid explicit \(for \) loops. It's not always possible, but when we can, we should use built-in functions or find some other ways to compute it. Vectorizing the implementation of Logistic Regression makes the code highly efficient. In this post we will see how we can use this technique to compute gradient descent without using even a single \(for \) loop. This code was non-vectorized and highly inefficent so we need to transform it.


Adversarial Orthogonal Regression: Two non-Linear Regressions for Causal Inference

arXiv.org Machine Learning

We propose two nonlinear regression methods, named Adversarial Orthogonal Regression (AdOR) for additive noise models and Adversarial Orthogonal Structural Equation Model (AdOSE) for the general case of structural equation models. Both methods try to make the residual of regression independent from regressors while putting no assumption on noise distribution. In both methods, two adversarial networks are trained simultaneously where a regression network outputs predictions and a loss network that estimates mutual information (in AdOR) and KL-divergence (in AdOSE). These methods can be formulated as a minimax two-player game; at equilibrium, AdOR finds a deterministic map between inputs and output and estimates mutual information between residual and inputs, while AdOSE estimates a conditional probability distribution of output given inputs. The proposed methods can be used as subroutines to address several learning problems in causality, such as causal direction determination (or more generally, causal structure learning) and causal model estimation. Synthetic and real-world experiments demonstrate that the proposed methods have a remarkable performance with respect to previous solutions.


Super learning for daily streamflow forecasting: Large-scale demonstration and comparison with multiple machine learning algorithms

arXiv.org Machine Learning

Daily streamflow forecasting through data-driven approaches is traditionally performed using a single machine learning algorithm. Existing applications are mostly restricted to examination of few case studies, not allowing accurate assessment of the predictive performance of the algorithms involved. Here we propose super learning (a type of ensemble learning) by combining 10 machine learning algorithms. We apply the proposed algorithm in one-step ahead forecasting mode. For the application, we exploit a big dataset consisting of 10-year long time series of daily streamflow, precipitation and temperature from 511 basins. The super learner improves over the performance of the linear regression algorithm by 20.06%, outperforming the "hard to beat in practice" equal weight combiner. The latter improves over the performance of the linear regression algorithm by 19.21%. The best performing individual machine learning algorithm is neural networks, which improves over the performance of the linear regression algorithm by 16.73%, followed by extremely randomized trees (16.40%), XGBoost (15.92%), loess (15.36%), random forests (12.75%), polyMARS (12.36%), MARS (4.74%), lasso (0.11%) and support vector regression (-0.45%). Based on the obtained large-scale results, we propose super learning for daily streamflow forecasting.


Sparse linear regression with compressed and low-precision data via concave quadratic programming

arXiv.org Machine Learning

We consider the problem of the recovery of a k-sparse vector from compressed linear measurements when data are corrupted by a quantization noise. When the number of measurements is not sufficiently large, different $k$-sparse solutions may be present in the feasible set, and the classical l1 approach may be unsuccessful. For this motivation, we propose a non-convex quadratic programming method, which exploits prior information on the magnitude of the non-zero parameters. This results in a more efficient support recovery. We provide sufficient conditions for successful recovery and numerical simulations to illustrate the practical feasibility of the proposed method.


Deep Learning and MARS: A Connection

arXiv.org Machine Learning

We consider least squares regression estimates using deep neural networks. We show that these estimates satisfy an oracle inequality, which implies that (up to a logarithmic factor) the error of these estimates is at least as small as the optimal possible error bound which one would expect for MARS in case that this procedure would work in the optimal way. As a result we show that our neural networks are able to achieve a dimensionality reduction in case that the regression function locally has low dimensionality. This assumption seems to be realistic in real-world applications, since selected high-dimensional data are often confined to locally-low-dimensional distributions. In our simulation study we provide numerical experiments to support our theoretical results and to compare our estimate with other conventional nonparametric regression estimates, especially with MARS. The use of our estimates is illustrated through a real data analysis.