Goto

Collaborating Authors

 Regression


Machine Learning in Python - Feature Selection - Step Up Analytics

#artificialintelligence

The data features that we use to train our machine learning models have a huge influence on the performance we can achieve. Irrelevant or partially relevant features can negatively impact model performance. Feature selection is a process where we automatically select those features in our data that contribute most to the prediction variable or output in which we are interested. Having irrelevant features in our data can decrease the accuracy of many models, especially linear algorithms like linear and logistic regression. We can learn more about feature selection with scikit-learn in the article Feature selection.


Regression Basics For Business Analysis

#artificialintelligence

If you've ever wondered how two or more things relate to each other, or if you've ever had your boss ask you to create a forecast or analyze relationships between variables, then learning regression would be worth your time. In this article, you'll learn the basics of simple linear regression - a tool commonly used in forecasting and financial analysis. We will begin by learning the core principles of regression, first learning about covariance and correlation, and then moving on to building and interpreting a regression output. A lot of software such as Microsoft Excel can do all the regression calculations and outputs for you, but it is still important to learn the underlying mechanics. At the center of regression is the relationship between two variables called the dependent and independent variables.


Expectile Matrix Factorization for Skewed Data Analysis

arXiv.org Machine Learning

Matrix factorization is a popular approach to solving matrix estimation problems based on partial observations. Existing matrix factorization is based on least squares and aims to yield a low-rank matrix to interpret the conditional sample means given the observations. However, in many real applications with skewed and extreme data, least squares cannot explain their central tendency or tail distributions, yielding undesired estimates. In this paper, we propose \emph{expectile matrix factorization} by introducing asymmetric least squares, a key concept in expectile regression analysis, into the matrix factorization framework. We propose an efficient algorithm to solve the new problem based on alternating minimization and quadratic programming. We prove that our algorithm converges to a global optimum and exactly recovers the true underlying low-rank matrices when noise is zero. For synthetic data with skewed noise and a real-world dataset containing web service response times, the proposed scheme achieves lower recovery errors than the existing matrix factorization method based on least squares in a wide range of settings.


Encrypted accelerated least squares regression

arXiv.org Machine Learning

Information that is stored in an encrypted format is, by definition, usually not amenable to statistical analysis or machine learning methods. In this paper we present detailed analysis of coordinate and accelerated gradient descent algorithms which are capable of fitting least squares and penalised ridge regression models, using data encrypted under a fully homomorphic encryption scheme. Gradient descent is shown to dominate in terms of encrypted computational speed, and theoretical results are proven to give parameter bounds which ensure correctness of decryption. The characteristics of encrypted computation are empirically shown to favour a non-standard acceleration technique. This demonstrates the possibility of approximating conventional statistical regression methods using encrypted data without compromising privacy.


Facebook's Prophet uses Stan

#artificialintelligence

I wanted to tell you about an open source forecasting package we just released called Prophet: I thought the readers of your blog might be interested in both the package and the fact that we built it on top of Stan. Under the hood, Prophet uses Stan for optimization (and sampling if the user desires) in order to fit a non-linear additive model and generate uncertainty intervals. The big win for us was that 1) Stan does a great job at letting us separate optimization from the model code and 2) we could share the same core procedure between Python and R implementations. One of the neat things we do is automatically detect changepoints in the time series by specifying a sequence potential parameter changes and shrinking the shifts using a Laplace prior. We also let the user adjust the flexibility of the model by tuning precision of priors, which we think is intuitive for most users.


Bayesian Analysis for a Logistic Regression Model - MATLAB & Simulink Example

#artificialintelligence

Bayesian inference is the process of analyzing statistical models with the incorporation of prior knowledge about the model or model parameters. The root of such inference is Bayes' theorem: In this formula mu and tau, sometimes known as hyperparameters, are also known. The following graph shows the prior, likelihood, and posterior for theta. In some simple problems such as the previous normal mean inference example, it is easy to figure out the posterior distribution in a closed form. But in general problems that involve non-conjugate priors, the posterior distributions are difficult or impossible to compute analytically.


A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

arXiv.org Artificial Intelligence

Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.


Lipschitz Optimisation for Lipschitz Interpolation

arXiv.org Machine Learning

Supervised machine learning methods are algorithms for inductive inference. On the basis of a sample, they construct (learn) a computable model of a data generating process that facilitates inference over the underlying ground truth function and aims to predict its function values at unobserved inputs. Among supervised learning methods, nonparametric algorithms tend to offer greater flexibility to learn rich function classes. Unfortunately, many classical techniques for nonparametric regression, such as the Nadaraya-Watson estimator [21], [14] or the LOESS method, [6] suffer from a practical limitation: their regression performance depends on the choice of hyperparameters. While in principle, it would be possible to tune these to the data (in manner similar in spirit to the one we propose in this work), to the best of our knowledge, currently there is little understanding on how to do so with a global optimiser that offers theoretical performance guarantees on the optimisation solution. This means that in practice, one is left to engineer these hyperparameters (or the settings of an optimiser) by manual tuning in order to ensure good performance on a particular learning problem. Of course, this stands in opposition to the motivation for utilising nonparametric learning, especially in system identification: which is to facilitate flexible and fully automated black-box learning that does not require manual intervention.


how to choose predictive variables in my time series regression model

@machinelearnbot

Business knowldege (domain exeprtise) could defintely help in pruning the set of variables from the starting set of 300 to a smaller set. But even if you cut it down to a 100 variables, taking those and lags of different orders on these variables, you could have an overwhelming number of "explanatory" variables to forecast the dependent variable (daily sales). Sometimes a model in which the lag of the dependent variable is used as an explanatory variable along with the other selected variables among the 300 (perhaps with lags of a few of a them, based on intuition) will not only reduce the number of explantory variables and thereby increase the degrees of freedom for the prediction model but also provide more stable predictions. Also one can make use of the first so many principal components among the chosen predictor variables to deal with multicollinearity issues which typically arise in such probelms. This also cuts down the number of parameters and thereby increases the df of the model predictions.


SAGA and Restricted Strong Convexity

arXiv.org Machine Learning

SAGA is a fast incremental gradient method on the finite sum problem and its effectiveness has been tested on a vast of applications. In this paper, we analyze SAGA on a class of non-strongly convex and non-convex statistical problem such as Lasso, group Lasso, Logistic regression with $\ell_1$ regularization, linear regression with SCAD regularization and Correct Lasso. We prove that SAGA enjoys the linear convergence rate up to the statistical estimation accuracy, under the assumption of restricted strong convexity (RSC). It significantly extends the applicability of SAGA in convex and non-convex optimization.