Regression
Robust mixture of experts modeling using the skew $t$ distribution
Mixture of Experts (MoE) is a popular framework in the fields of statistics and machine learning for modeling heterogeneity in data for regression, classification and clustering. MoE for continuous data are usually based on the normal distribution. However, it is known that for data with asymmetric behavior, heavy tails and atypical observations, the use of the normal distribution is unsuitable. We introduce a new robust non-normal mixture of experts modeling using the skew $t$ distribution. The proposed skew $t$ mixture of experts, named STMoE, handles these issues of the normal mixtures experts regarding possibly skewed, heavy-tailed and noisy data. We develop a dedicated expectation conditional maximization (ECM) algorithm to estimate the model parameters by monotonically maximizing the observed data log-likelihood. We describe how the presented model can be used in prediction and in model-based clustering of regression data. Numerical experiments carried out on simulated data show the effectiveness and the robustness of the proposed model in fitting non-linear regression functions as well as in model-based clustering. Then, the proposed model is applied to the real-world data of tone perception for musical data analysis, and the one of temperature anomalies for the analysis of climate change data. The obtained results confirm the usefulness of the model for practical data analysis applications.
Robust mixture of experts modeling using the $t$ distribution
Mixture of Experts (MoE) is a popular framework for modeling heterogeneity in data for regression, classification, and clustering. For regression and cluster analyses of continuous data, MoE usually use normal experts following the Gaussian distribution. However, for a set of data containing a group or groups of observations with heavy tails or atypical observations, the use of normal experts is unsuitable and can unduly affect the fit of the MoE model. We introduce a robust MoE modeling using the $t$ distribution. The proposed $t$ MoE (TMoE) deals with these issues regarding heavy-tailed and noisy data. We develop a dedicated expectation-maximization (EM) algorithm to estimate the parameters of the proposed model by monotonically maximizing the observed data log-likelihood. We describe how the presented model can be used in prediction and in model-based clustering of regression data. The proposed model is validated on numerical experiments carried out on simulated data, which show the effectiveness and the robustness of the proposed model in terms of modeling non-linear regression functions as well as in model-based clustering. Then, it is applied to the real-world data of tone perception for musical data analysis, and the one of temperature anomalies for the analysis of climate change data. The obtained results show the usefulness of the TMoE model for practical applications.
Part 2: Should we Send the Azure Machine Learning Model to Market? Mariner
The previous post concentrated on deciding if a categorical Machine Learning model should be released for production or not. This post concentrates on interpreting the scores of a regression model and the implication in using it in a decision management feature. Decision management solutions apply business rules written by humans and automatically apply them to the cases they are presented. Digital masters are more likely to include the output from a machine learning prediction into their decision management systems than those who are just starting their digital journey. If the last inspection was 30 days ago then put the battery on the "inspection due" list.
Beginners Guide to Regression Analysis and Plot Interpretations
"The road to machine learning starts with Regression. If you are aspiring to become a data scientist, regression is the first algorithm you need to learnmaster. Not just to clear job interviews, but to solve real world problems. Till today, a lot of consultancy firms continue to use regression techniques at a larger scale to help their clients. No doubt, it's one of the easiest algorithms to learn, but it requires persistent effort to get to the master level.
Predicting Car Prices Part 1: Linear Regression
Let's walk through an example of predictive analytics using a data set that most people can relate to:prices of cars. In this case, we have a data set with historical Toyota Corolla prices along with related car attributes. In predictive models, there is a response variable(also called dependent variable), which is the variable that we are interested in predicting. The independent variables(the predictors also called features in the machine learning community) are one or more numeric variables we are using to predict the response variable. Given we are using a linear regression model, we are assuming the relationship between the independent and dependent variables follow a straight line.
A Communication-Efficient Parallel Method for Group-Lasso
Group-Lasso (gLasso) identifies important explanatory factors in predicting the response variable by considering the grouping structure over input variables. However, most existing algorithms for gLasso are not scalable to deal with large-scale datasets, which are becoming a norm in many applications. In this paper, we present a divide-and-conquer based parallel algorithm (DC-gLasso) to scale up gLasso in the tasks of regression with grouping structures. DC-gLasso only needs two iterations to collect and aggregate the local estimates on subsets of the data, and is provably correct to recover the true model under certain conditions. We further extend it to deal with overlappings between groups. Empirical results on a wide range of synthetic and real-world datasets show that DC-gLasso can significantly improve the time efficiency without sacrificing regression accuracy.
Optimized Linear Imputation
Resheff, Yehezkel S., Weinshall, Daphna
Often in real-world datasets, especially in high dimensional data, some feature values are missing. Since most data analysis and statistical methods do not handle gracefully missing values, the first step in the analysis requires the imputation of missing values. Indeed, there has been a long standing interest in methods for the imputation of missing values as a pre-processing step. One recent and effective approach, the IRMI stepwise regression imputation method, uses a linear regression model for each real-valued feature on the basis of all other features in the dataset. However, the proposed iterative formulation lacks convergence guarantee. Here we propose a closely related method, stated as a single optimization problem and a block coordinate-descent solution which is guaranteed to converge to a local minimum. Experiments show results on both synthetic and benchmark datasets, which are comparable to the results of the IRMI method whenever it converges. However, while in the set of experiments described here IRMI often does not converge, the performance of our methods is shown to be markedly superior in comparison with other methods.
Monte Carlo Simulation Basics, III: Regression Model Estimators
This post is the third in a series of posts that I'm writing about Monte Carlo (MC) simulation, especially as it applies to econometrics. If you've already seen the first two posts in the series ( and) then you'll know that my intention is to provide a very elementary introduction to this topic. There are lots of details that I've been avoiding, deliberately. In this post we're going to pick up from where the previous post about estimator properties based on the sampling distribution left off. Specifically, I'll be applying the ideas that were introduced in that post in the context of regression analysis. We'll take a look at the properties of the Least Squares estimator in three different situations. In doing so, I'll be able to illustrate, through simulation, some "text book" results that will know about already. If you haven't read the immediately preceding post in this series already, I urge you to do so before continuing.
The best kept secret about linear and logistic regression
All the regression theory developed by statisticians over the last 200 years (related to the general linear model) is useless. Regression can be performed as accurately without statistical models, including the computation of confidence intervals (for estimates, predicted values or regression parameters). The non-statistical approach is also more robust than theory described in all statistics textbooks and taught in all statistical courses. It does not require Map-Reduce when data is really big, nor any matrix inversion, maximum likelihood estimation, or mathematical optimization (Newton algorithm). It is indeed incredibly simple, robust, easy to interpret, and easy to code (no statistical libraries required).
Logistic Regression (Credit Scoring) Modeling using SAS
I am a seasoned Analytics professional with 15 years of professional experience. I have industry experience of impactful and actionable analytics. I am a keen trainer, who believes that training is all about making users understand the concepts. If students remain confused after the training, the training is useless. I ensure that after my training, students (or partcipants) are crystal clear on how to use the learning in their business scenarios.