Regression
Forecasting of commercial sales with large scale Gaussian Processes
Rivera, Rodrigo, Burnaev, Evgeny
This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management.
Predictive Analytics in Finance - Online Technical Discussion Groups--Wolfram Community
We extend the discussion on machine learning one step further and focus on predictive analysis offered in the ML domain. Prediction builds on classification and clustering techniques discussed previously and uses pattern detection and similarity features in data to estimate the future outcome. This is particularly relevant to finance where the ability of data groups to predict the values of less-liquid instruments is of high interest. We demonstrate the prediction using CDS data and show the application of non-regression models as superior methods for predictive analysis. Classification and clustering which we discussed in previous installments naturally extends into another field of data mining - prediction.
Random Forests of Interaction Trees for Estimating Individualized Treatment Effects in Randomized Trials
Su, Xiaogang, Peña, Annette T., Liu, Lei, Levine, Richard A.
Assessing heterogeneous treatment effects has become a growing interest in advancing precision medicine. Individualized treatment effects (ITE) play a critical role in such an endeavor. Concerning experimental data collected from randomized trials, we put forward a method, termed random forests of interaction trees (RFIT), for estimating ITE on the basis of interaction trees (Su et al., 2009). To this end, we first propose a smooth sigmoid surrogate (SSS) method, as an alternative to greedy search, to speed up tree construction. RFIT outperforms the traditional `separate regression' approach in estimating ITE. Furthermore, standard errors for the estimated ITE via RFIT can be obtained with the infinitesimal jackknife method. We assess and illustrate the use of RFIT via both simulation and the analysis of data from an acupuncture headache trial.
Ideas on interpreting machine learning
You've probably heard by now that machine learning algorithms can use big data to predict whether a donor will give to a charity, whether an infant in a NICU will develop sepsis, whether a customer will respond to an ad, and on and on. Machine learning can even drive cars and predict elections. I believe it can, but these recent high-profile hiccups should leave everyone who works with data (big or not) and machine learning algorithms asking themselves some very hard questions: do I understand my data? Do I understand the model and answers my machine learning algorithm is giving me? And do I trust these answers? Unfortunately, the complexity that bestows the extraordinary predictive abilities on machine learning algorithms also makes the answers the algorithms produce hard to understand, and maybe even hard to trust. Although it is possible to enforce monotonicity constraints (a relationship that only changes in one direction) between independent variables and a machine-learned ...
Switching nonparametric regression models for multi-curve data
de Souza, Camila P. E., Heckman, Nancy E., Xu, Helena
We develop and apply an approach for analyzing multi-curve data where each curve is driven by a latent state process. The state at any particular point determines a smooth function, forcing the individual curve to switch from one function to another. Thus each curve follows what we call a switching nonparametric regression model. We develop an EM algorithm to estimate the model parameters. We also obtain standard errors for the parameter estimates of the state process. We consider several types of state processes: independent and identically distributed, independent but depending on a covariate and Markov. Simulation studies show the frequentist properties of our estimates. We apply our methods to a data set of a building's power usage.
Videos for Business Analytics using Data Mining course
Five years ago, in 2012, I decided to experiment in improving my teaching by creating a flipped classroom (and semi-MOOC) for my course "Business Analytics Using Data Mining" (BADM) at the Indian School of Business. I initially designed the course at University of Maryland's Smith School of Business in 2005 and taught it until 2010. When I joined ISB in 2011 I started teaching multiple sections of BADM (which was started by Ravi Bapna in 2006), and the course was fast growing in popularity. Repeating the same lectures in multiple course sections made me realize it was time for scale! I therefore created 30 videos, covering various supervised methods (k-NN, linear and logistic regression, trees, naive Bayes, etc.) and unsupervised methods (principal components analysis, clustering, association rules), as well as important principles such as performance evaluation, the notion of a holdout set, and more.
Blog series: An introduction to using machine learning in marketing - for regression problems - Iridium
Time series forecasting, such as predictions of sales values and volumes, can be a challenging problem, particularly when classical statistical methods do not cope well with the complexity of the data. Forecasting with these methods can be time-consuming, taking weeks or months at a time to deliver. Machine Learning (ML) algorithms can detect complex relationships and trends within the data and create accurate sales forecasts nearly in real time. ML is therefore becoming an increasingly valuable tool for brand owners, enhancing their ability to make the right decisions faster. In our previous blog in this series, titled An introduction to using machine learning in marketing – for classification problems, we discussed the basics of machine learning, the Random Forest algorithm for classification problems and how it can be used in Smart Marketing campaigns.
Estimating the coefficients of a mixture of two linear regressions by expectation maximization
Klusowski, Jason M., Yang, Dana, Brinda, W. D.
The Expectation-Maximization (EM) algorithm is a widely used technique for parameter estimation. It is an iterative procedure that monotonically increases the likelihood. When the likelihood is not concave, it is well known that EM can converge to a non-global optimum. However, recent work has sidestepped the question of whether EM reaches the likelihood maximizer, instead by directly working out statistical guarantees on its loss. These 1 explorations have identified regions of initialization for which the EM estimate approaches the true parameter in probability, assuming the model is well-specified. This line of research was spurred by [1] which established general conditions for which a ball centered at the true parameter would be a basin of attraction for the population version of the EM operator. For a large enough sample size, the difference (in that ball) between the sample EM operator and the population EM operator can be bounded such that the EM estimate approaches the true parameter with high probability. That bound is the sum of two terms with distinct interpretations.
Statistical Inference for Machine Learning Inverse Probability Weighting with Survival Outcomes
Inverse probability weighting (IPW) is an important estimation technique for studies with missing outcome data, and for causal inference from observational studies. In survival analysis under right censoring, inverse weighting by the probability of censoring conditional on covariates (henceforth referred to as censoring mechanism) can be used to adjust for informative censoring. Since the censoring mechanism is often unknown, it must be estimated from data. Asymptotic properties of the IPW estimator such as consistency and its large sample distribution thus depend on the large sample behavior of the estimator of the censoring mechanism. In low dimensional problems with categorical covariates, the nonparametric maximum likelihood estimator (NPMLE) may be employed. In moderate to high dimensions or with continuous covariates, the curse of dimensionality precludes the use of the NPMLE, making it necessary to use smoothing techniques.
Multivariate Regression with Gross Errors on Manifold-valued Data
Zhang, Xiaowei, Shi, Xudong, Sun, Yu, Cheng, Li
We consider the topic of multivariate regression on manifold-valued output, that is, for a multivariate observation, its output response lies on a manifold. Moreover, we propose a new regression model to deal with the presence of grossly corrupted manifold-valued responses, a bottleneck issue commonly encountered in practical scenarios. Our model first takes a correction step on the grossly corrupted responses via geodesic curves on the manifold, and then performs multivariate linear regression on the corrected data. This results in a nonconvex and nonsmooth optimization problem on manifolds. To this end, we propose a dedicated approach named PALMR, by utilizing and extending the proximal alternating linearized minimization techniques. Theoretically, we investigate its convergence property, where it is shown to converge to a critical point under mild conditions. Empirically, we test our model on both synthetic and real diffusion tensor imaging data, and show that our model outperforms other multivariate regression models when manifold-valued responses contain gross errors, and is effective in identifying gross errors.