Goto

Collaborating Authors

 Regression


Regularization: Machine Learning

#artificialintelligence

For understanding the concept of regularization and its link with Machine Learning, we first need to understand why do we need regularization. We all know Machine learning is about training a model with relevant data and using the model to predict unknown data. By the word unknown, it means the data which the model has not seen yet. We have trained the model, and are getting good scores while using training data. But during the process of prediction, we found that the model is underperforming when compared to the training part. Now, this may be a case of over-fitting(about which I will be explaining below) which is causing incorrect prediction by the model.


12+ BEST Machine Learning with Python Masterclass [2020] [UPDATE] - Gift Course

#artificialintelligence

Do you want to become an expert Python Developer? Get started with the Python Masterclass which consists of top 12 online tutorials to make your learning easy! This is An Ultimate Python Masterclass: Get 12 Exclusive Machine Learning Courses. This Machine Learning masterclass covers all essential concepts of Python and Machine Learning in addition to over 100 practical projects. Python was developed because the creator was frustrated by not being able to find exactly what he wanted from a programming language.


Tradeoff-Focused Contrastive Explanation for MDP Planning

arXiv.org Artificial Intelligence

End-users' trust in automated agents is important as automated decision-making and planning is increasingly used in many aspects of people's lives. In real-world applications of planning, multiple optimization objectives are often involved. Thus, planning agents' decisions can involve complex tradeoffs among competing objectives. It can be difficult for the end-users to understand why an agent decides on a particular planning solution on the basis of its objective values. As a result, the users may not know whether the agent is making the right decisions, and may lack trust in it. In this work, we contribute an approach, based on contrastive explanation, that enables a multi-objective MDP planning agent to explain its decisions in a way that communicates its tradeoff rationale in terms of the domain-level concepts. We conduct a human subjects experiment to evaluate the effectiveness of our explanation approach in a mobile robot navigation domain. The results show that our approach significantly improves the users' understanding, and confidence in their understanding, of the tradeoff rationale of the planning agent.


Rule-based Bayesian regression

arXiv.org Machine Learning

We introduce a novel rule-based approach for handling regression problems. The new methodology carries elements from two frameworks: (i) it provides information about the uncertainty of the parameters of interest using Bayesian inference, and (ii) it allows the incorporation of expert knowledge through rule-based systems. The blending of those two different frameworks can be particularly beneficial for various domains (e.g. engineering), where, even though the significance of uncertainty quantification motivates a Bayesian approach, there is no simple way to incorporate researcher intuition into the model. We validate our models by applying them to synthetic applications: a simple linear regression problem and two more complex structures based on partial differential equations. Finally, we review the advantages of our methodology, which include the simplicity of the implementation, the uncertainty reduction due to the added information and, in some occasions, the derivation of better point predictions, and we address limitations, mainly from the computational complexity perspective, such as the difficulty in choosing an appropriate algorithm and the added computational burden.


R squared Does Not Measure Predictive Capacity or Statistical Adequacy - KDnuggets

#artificialintelligence

The R-squared Goodness-of-Fit measure is one of the most widely available statistics accompanying the output of regression analysis in statistical software. Perhaps partially due to its widespread availability, it is also one of the most often misunderstood ones. In a regression with a single independent variable R2 is calculated as the ratio between the variation explained by the model and the total observed variation. It is often called the coefficient of determination and can be interpreted as the proportion of variation explained by the posed predictor. In such a case, it is equivalent to the square of the correlation coefficient of the observed and fitted values of the variable.


Convergence of Sparse Variational Inference in Gaussian Processes Regression

arXiv.org Machine Learning

Gaussian processes are distributions over functions that are versatile and mathematically convenient priors in Bayesian modelling. However, their use is often impeded for data with large numbers of observations, $N$, due to the cubic (in $N$) cost of matrix operations used in exact inference. Many solutions have been proposed that rely on $M \ll N$ inducing variables to form an approximation at a cost of $\mathcal{O}(NM^2)$. While the computational cost appears linear in $N$, the true complexity depends on how $M$ must scale with $N$ to ensure a certain quality of the approximation. In this work, we investigate upper and lower bounds on how $M$ needs to grow with $N$ to ensure high quality approximations. We show that we can make the KL-divergence between the approximate model and the exact posterior arbitrarily small for a Gaussian-noise regression model with $M\ll N$. Specifically, for the popular squared exponential kernel and $D$-dimensional Gaussian distributed covariates, $M=\mathcal{O}((\log N)^D)$ suffice and a method with an overall computational cost of $\mathcal{O}(N(\log N)^{2D}(\log\log N)^2)$ can be used to perform inference.


Two-step penalised logistic regression for multi-omic data with an application to cardiometabolic syndrome

arXiv.org Machine Learning

Summary: Building classification models that predict a binary class label on the basis of high dimensional multi-omics datasets poses several challenges, due to the typically widely differing characteristics of the data layers in terms of number of predictors, type of data, and levels of noise. Previous research has shown that applying classical logistic regression with elastic-net penalty to these datasets can lead to poor results (Liu et al., 2018). We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately and a predictive model is then built using the variables selected in the first step. Here, our approach is compared to other methods that have been developed for the same purpose, and we adapt existing software for multi-omic linear regression (Zhao and Zucknick, 2020) to the logistic regression setting. Extensive simulation studies show that our approach should be preferred if the goal is to select as many relevant predictors as possible, as well as achieving prediction performances comparable to those of the best competitors. Our motivating example is a cardiometabolic syndrome dataset comprising eight'omic data types for 2 extreme phenotype groups (10 obese and 10 lipodystrophy individuals) and 185 blood donors. Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.


Explainable-AI: Where Supervised Learning Can Falter

#artificialintelligence

Disclaimer: I'll be talking mainly about logistic-regression and basic feed-forward neural networks, so its helpful to have programmed with those 2 models before reading this piece. OK -- before statisticians and ML folks come running after me after reading the title, I'm not talking about linear regression, for example. Yes, in linear regression, you can use the R-squared (or adjusted R-squared statistic) to talk about explained variance, and since linear regression only involves addition between independent variables (or predictors), they're pretty interpretable. If you were doing a linear regression to predict, say the price of a car Car_Price, based on the number of seats, mileage, maximum-speed, and battery life, your linear model could be โ€“โ€“ say Car_Price c1*Seats c2*Mileage c3*Speed c4*Battery_Power โ€“โ€“ the fact that variables are only added makes it pretty interpretable. But when it comes to more complex prediction models like Logistic Regression and neural networks, everything about the predictors (or called "features" in ML) becomes more confusing.


A Multi-Variate Triple-Regression Forecasting Algorithm for Long-Term Customized Allergy Season Prediction

arXiv.org Machine Learning

In this paper, we propose a novel multi-variate algorithm using a triple-regression methodology to predict the airborne-pollen allergy season that can be customized for each patient in the long term. To improve the prediction accuracy, we first perform a pre-processing to integrate the historical data of pollen concentration and various inferential signals from other covariates such as the meteorological data. We then propose a novel algorithm which encompasses three-stage regressions: in Stage 1, a regression model to predict the start/end date of a airborne-pollen allergy season is trained from a feature matrix extracted from 12 time series of the covariates with a rolling window; in Stage 2, a regression model to predict the corresponding uncertainty is trained based on the feature matrix and the prediction result from Stage 1; in Stage 3, a weighted linear regression model is built upon prediction results from Stage 1 and 2. It is observed and proved that Stage 3 contributes to the improved forecasting accuracy and the reduced uncertainty of the multi-variate triple-regression algorithm. Based on different allergy sensitivity level, the triggering concentration of the pollen - the definition of the allergy season can be customized individually. In our backtesting, a mean absolute error (MAE) of 4.7 days was achieved using the algorithm. We conclude that this algorithm could be applicable in both generic and long-term forecasting problems.


Rademacher upper bounds for cross-validation errors with an application to the lasso

arXiv.org Machine Learning

We establish a general upper bound for $K$-fold cross-validation ($K$-CV) errors that can be adapted to many $K$-CV-based estimators and learning algorithms. Based on Rademacher complexity of the model and the Orlicz-$\Psi_{\nu}$ norm of the error process, the CV error upper bound applies to both light-tail and heavy-tail error distributions. We also extend the CV error upper bound to $\beta$-mixing data using the technique of independent blocking. We provide a Python package (\texttt{CVbound}, \url{https://github.com/isaac2math}) for computing the CV error upper bound in $K$-CV-based algorithms. Using the lasso as an example, we demonstrate in simulations that the upper bounds are tight and stable across different parameter settings and random seeds. As well as accurately bounding the CV errors for the lasso, the minimizer of the new upper bounds can be used as a criterion for variable selection. Compared with the CV-error minimizer, simulations show that tuning the lasso penalty parameter according to the minimizer of the upper bound yields a more sparse and more stable model that retains all of the relevant variables.