Goto

Collaborating Authors

 Regression


De-Biasing The Lasso With Degrees-of-Freedom Adjustment

arXiv.org Machine Learning

This paper studies schemes to de-bias the Lasso in sparse linear regression where the goal is to estimate and construct confidence intervals for a low-dimensional projection of the unknown coefficient vector in a preconceived direction $a_0$. We assume that the design matrix has iid Gaussian rows with known covariance matrix $\Sigma$. Our analysis reveals that previous propositions to de-bias the Lasso require a modification in order to enjoy asymptotic efficiency in a full range of the level of sparsity. This modification takes the form of a degrees-of-freedom adjustment that accounts for the dimension of the model selected by the Lasso. Let $s_0$ denote the number of nonzero coefficients of the true coefficient vector. The unadjusted de-biasing schemes proposed in previous studies enjoys efficiency if $s_0\lll n^{2/3}$, up to logarithmic factors. However, if $s_0\ggg n^{2/3}$, the unadjusted scheme cannot be efficient in certain directions $a_0$. In the latter regime, it it necessary to modify existing procedures by an adjustment that accounts for the degrees-of-freedom of the Lasso. The proposed degrees-of-freedom adjustment grants asymptotic efficiency for any direction $a_0$. This holds under a Sparse Riecz Condition on the covariance matrix $\Sigma$ and the sample size requirement $s_0/p\to0$ and $s_0\log(p/s_0)/n\to0$. Our analysis also highlights that the degrees-of-freedom adjustment is not necessary when the initial bias of the Lasso in the direction $a_0$ is small, which is granted under more stringent conditions on $\Sigma^{-1}$. This explains why the necessity of degrees-of-freedom adjustment did not appear in some previous studies. The main proof argument involves a Gaussian interpolation path similar to that used to derive Slepian's lemma. It yields a sharp $\ell_\infty$ error bound for the Lasso under Gaussian design which is of independent interest.


Can We Apply Linear Regression to Non-linear Data? - Machine Learning Interview Questions

#artificialintelligence

One of the common question is "Can we apply #Linear #Regression to #Non-linear data?" watch this video to understand this question and how to explain in the interview. If you are looking for Course Details please visit: https://datamites.com/ You can learn business statistics, tableau, deep learning, data mining etc,..


Apache Spark Machine Learning Tutorial

#artificialintelligence

Editor's Note: Download this Free eBook: Getting Started with Apache Spark 2.x – from Inception to Production In this blog post, we will give an introduction to machine learning and deep learning, and we will go over the main Spark machine learning algorithms and techniques with some real-world use cases. The goal is to give you a better understanding of what you can do with machine learning. Machine learning is becoming more accessible to developers, and data scientists work with domain experts, architects, developers, and data engineers, so it is important for everyone to have a better understanding of the possibilities. Every piece of information that your business generates has potential to add value. This overview is meant to provoke a review of your own data to identify new opportunities.


Semi-supervised Approach to Soft Sensor Modeling for Fault Detection in Industrial Systems with Multiple Operation Modes

arXiv.org Machine Learning

In industrial systems, certain process variables that need to be monitored for detecting faults are often difficult or impossible to measure. Soft sensor techniques are widely used to estimate such difficult-to-measure process variables from easy-to-measure ones. Soft sensor modeling requires training datasets including the information of various states such as operation modes, but the fault dataset with the target variable is insufficient as the training dataset. This paper describes a semi-supervised approach to soft sensor modeling to incorporate an incomplete dataset without the target variable in the training dataset. To incorporate the incomplete dataset, we consider the properties of processes at transition points between operation modes in the system. The regression coefficients of the operation modes are estimated under constraint conditions obtained from the information on the mode transitions. In a case study, this constrained soft sensor modeling was used to predict refrigerant leaks in air-conditioning systems with heating and cooling operation modes. The results show that this modeling method is promising for soft sensors in a system with multiple operation modes.


The Generalized Complex Kernel Least-Mean-Square Algorithm

arXiv.org Machine Learning

We propose a novel adaptive kernel based regression method for complex-valued signals: the generalized complex-valued kernel least-mean-square (gCKLMS). We borrow from the new results on widely linear reproducing kernel Hilbert space (WL-RKHS) for nonlinear regression and complex-valued signals, recently proposed by the authors. This paper shows that in the adaptive version of the kernel regression for complex-valued signals we need to include another kernel term, the so-called pseudo-kernel. This new solution is endowed with better representation capabilities in complex-valued fields, since it can efficiently decouple the learning of the real and the imaginary part. Also, we review previous realizations of the complex KLMS algorithm and its augmented version to prove that they can be rewritten as particular cases of the gCKLMS. Furthermore, important conclusions on the kernels design are drawn that help to greatly improve the convergence of the algorithms. In the experiments, we revisit the nonlinear channel equalization problem to highlight the better convergence of the gCKLMS compared to previous solutions. Also, the flexibility of the proposed generalized approach is tested in a second experiment with non-independent real and imaginary parts. The results illustrate the significant performance improvements of the gCKLMS approach when the complex-valued signals have different properties for the real and imaginary parts.


Spatial Analysis Made Easy with Linear Regression and Kernels

arXiv.org Machine Learning

Kernel methods are an incredibly popular technique for extending linear models to non-linear problems via a mapping to an implicit, high-dimensional feature space. While kernel methods are computationally cheaper than an explicit feature mapping, they are still subject to cubic cost on the number of points. Given only a few thousand locations, this computational cost rapidly outstrips the currently available computational power. This paper aims to provide an overview of kernel methods from first-principals (with a focus on ridge regression), before progressing to a review of random Fourier features (RFF), a set of methods that enable the scaling of kernel methods to big datasets. At each stage, the associated R code is provided. We begin by illustrating how the dual representation of ridge regression relies solely on inner products and permits the use of kernels to map the data into high-dimensional spaces. We progress to RFFs, showing how only a few lines of code provides a significant computational speed-up for a negligible cost to accuracy. We provide an example of the implementation of RFFs on a simulated spatial data set to illustrate these properties. Lastly, we summarise the main issues with RFFs and highlight some of the advanced techniques aimed at alleviating them.


Online Sampling from Log-Concave Distributions

arXiv.org Machine Learning

Given a sequence of convex functions $f_0, f_1, \ldots, f_T$, we study the problem of sampling from the Gibbs distribution $\pi_t \propto e^{-\sum_{k=0}^t f_k}$ for each epoch $t$ in an online manner. This problem occurs in applications to machine learning, Bayesian statistics, and optimization where one constantly acquires new data, and must continuously update the distribution. Our main result is an algorithm that generates independent samples from a distribution that is a fixed $\varepsilon$ TV-distance from $\pi_t$ for every $t$ and, under mild assumptions on the functions, makes poly$\log(T)$ gradient evaluations per epoch. All previous results for this problem imply a bound on the number of gradient or function evaluations which is at least linear in $T$. While we assume the functions have bounded second moment, we do not assume strong convexity. In particular, we show that our assumptions hold for online Bayesian logistic regression, when the data satisfy natural regularity properties. In simulations, our algorithm achieves accuracy comparable to that of a Markov chain specialized to logistic regression. Our main result also implies the first algorithm to sample from a $d$-dimensional log-concave distribution $\pi_T \propto e^{-\sum_{k=0}^T f_k}$ where the $f_k$'s are not assumed to be strongly convex and the total number of gradient evaluations is roughly $T\log(T)+\mathrm{poly}(d),$ as opposed to $T\cdot \mathrm{poly}(d)$ implied by prior works. Key to our algorithm is a novel stochastic gradient Langevin dynamics Markov chain that has a carefully designed variance reduction step built-in with fixed constant batch size. Technically, lack of strong convexity is a significant barrier to the analysis, and, here, our main contribution is a martingale exit time argument showing the chain is constrained to a ball of radius roughly poly$\log(T)$ for the duration of the algorithm.


Machine Learning with Python: NLP and Text Recognition

#artificialintelligence

Student and freelance AI / Big Data Developer with a passion for full stack. In this article, I apply a series of natural language processing techniques on a dataset containing reviews about businesses. After that, I train a model using Logistic Regression to forecast if a review is "positive" or "negative". The natural language processing field contains a series of tools that are very useful to extract, label, and forecast information starting from raw text data. This collection of techniques are mainly used in the field of emotions recognition, text tagging (for example to automatize the process of sorting complaints from a client), chatbots, and vocal assistants.


Kaggle Earthquake Prediction Challenge

#artificialintelligence

The popular Data Science competition website Kaggle has an ongoing competition to solve the problem of earthquake prediction. Given a dataset of seismographic activity from a laboratory simulation, participants are asked to create a predictive model for earthquakes. In this video, I'll attempt the challenge as a way to teach 3 concepts; the Data Science mindset, Categorical Boosting, and Support Vector Regression models. I'll be coding this using python from start to finish in the online Google colab environment. Thats what keeps me going.


On the consistency of supervised learning with missing values

arXiv.org Machine Learning

In many application settings, the data are plagued with missing features. These hinder data analysis. An abundant literature addresses missing values in an inferential framework, where the aim is to estimate parameters and their variance from incomplete tables. Here, we consider supervised-learning settings where the objective is to best predict a target when missing values appear in both training and test sets. We analyze which missing-values strategies lead to good prediction. We show the consistency of two approaches to estimating the prediction function. The most striking one shows that the widely-used mean imputation prior to learning method is consistent when missing values are not informative. This is in contrast with inferential settings as mean imputation is known to have serious drawbacks in terms of deformation of the joint and marginal distribution of the data. That such a simple approach can be consistent has important consequences in practice. This result holds asymptotically when the learning algorithm is consistent in itself. We contribute additional analysis on decision trees as they can naturally tackle empirical risk minimization with missing values. This is due to their ability to handle the half-discrete nature of variables with missing values. After comparing theoretically and empirically different missing-values strategies in trees, we recommend using the missing incorporated in attributes method as it can handle both non-informative and informative missing values.