Regression
A Unified Framework for Long Range and Cold Start Forecasting of Seasonal Profiles in Time Series
Xie, Christopher, Tank, Alex, Greaves-Tunnell, Alec, Fox, Emily
Providing long-range forecasts is a fundamental challenge in time series modeling, which is only compounded by the challenge of having to form such forecasts when a time series has never previously been observed. The latter challenge is the time series version of the cold-start problem seen in recommender systems which, to our knowledge, has not been directly addressed in previous work. In addition, modern time series datasets are often plagued by missing data. We focus on forecasting seasonal profiles---or baseline demand---for periods on the order of a year long, even in the cold-start setting or with otherwise missing data. Traditional time series approaches that perform iterated step-ahead methods struggle to provide accurate forecasts on such problems, let alone in the missing data regime. We present a computationally efficient framework which combines ideas from high-dimensional regression and matrix factorization on a carefully constructed data matrix. Key to our formulation and resulting performance is (1) leveraging repeated patterns over fixed periods of time and across series, and (2) metadata associated with the individual series. We provide analyses of our framework on large messy real-world datasets.
TensorFlow 101
TensorFlow is an open source machine learning library developed at Google. TensorFlow uses data flow graphs for numerical computations. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. In this post we will learn very basics of TensorFlow and we will build a Logistic Regression model using TensorFlow. The lowest level API - TensorFlow Core, provides you with complete programming control.
Learn Generalized Linear Models (GLM) using R
Generalized Linear Model (GLM) helps represent the dependent variable as a linear combination of independent variables. Simple linear regression is the traditional form of GLM. Simple linear regression works well when the dependent variable is normally distributed. The assumption of normally distributed dependent variable is often violated in real situations. For example, consider a case where dependent variable can take only positive values and has fat tail.
Generalized Concomitant Multi-Task Lasso for sparse multimodal regression
Massias, Mathurin, Fercoq, Olivier, Gramfort, Alexandre, Salmon, Joseph
In high dimension, it is customary to consider Lasso-type estimators to enforce sparsity. For standard Lasso theory to hold, the regularization parameter should be proportional to the noise level, yet the latter is generally unknown in practice. A possible remedy is to consider estimators, such as the Concomitant/Scaled Lasso, which jointly optimize over the regression coefficients as well as over the noise level, making the choice of the regularization independent of the noise level. However, when data from different sources are pooled to increase sample size, or when dealing with multimodal datasets, noise levels typically differ and new dedicated estimators are needed. In this work we provide new statistical and computational solutions to deal with such heteroscedastic regression models, with an emphasis on functional brain imaging with combined magneto- and electroencephalographic (M/EEG) signals. Adopting the formulation of Concomitant Lasso-type estimators, we propose a jointly convex formulation to estimate both the regression coefficients and the (square root of the) noise covariance. When our framework is instantiated to de-correlated noise, it leads to an efficient algorithm whose computational cost is not higher than for the Lasso and Concomitant Lasso, while addressing more complex noise structures. Numerical experiments demonstrate that our estimator yields improved prediction and support identification while correctly estimating the noise (square root) covariance. Results on multimodal neuroimaging problems with M/EEG data are also reported.
Revenue-based Attribution Modeling for Online Advertising
Zhao, Kaifeng, Mahboobi, Seyed Hanif, Bagheri, Saeed
This paper examines and proposes several attribution modeling methods that quantify how revenue should be attributed to online advertising inputs. We adopt and further develop relative importance method, which is based on regression models that have been extensively studied and utilized to investigate the relationship between advertising efforts and market reaction (revenue). Relative importance method aims at decomposing and allocating marginal contributions to the coefficient of determination (R^2) of regression models as attribution values. In particular, we adopt two alternative submethods to perform this decomposition: dominance analysis and relative weight analysis. Moreover, we demonstrate an extension of the decomposition methods from standard linear model to additive model. We claim that our new approaches are more flexible and accurate in modeling the underlying relationship and calculating the attribution values. We use simulation examples to demonstrate the superior performance of our new approaches over traditional methods. We further illustrate the value of our proposed approaches using a real advertising campaign dataset.
Top 6 errors novice machine learning engineers make
In machine learning, there are many ways to build a product or solution and each way assumes something different. Many times, it's not obvious how to navigate and identify which assumptions are reasonable. People new to machine learning make mistakes, which in hindsight will often feel silly. I've created a list of the top mistakes that novice machine learning engineers make. Hopefully, you can learn from these common errors and create more robust solutions that bring real value.
Fair Kernel Learning
Pérez-Suay, Adrián, Laparra, Valero, Mateo-García, Gonzalo, Muñoz-Marí, Jordi, Gómez-Chova, Luis, Camps-Valls, Gustau
New social and economic activities massively exploit big data and machine learning algorithms to do inference on people's lives. Applications include automatic curricula evaluation, wage determination, and risk assessment for credits and loans. Recently, many governments and institutions have raised concerns about the lack of fairness, equity and ethics in machine learning to treat these problems. It has been shown that not including sensitive features that bias fairness, such as gender or race, is not enough to mitigate the discrimination when other related features are included. Instead, including fairness in the objective function has been shown to be more efficient. We present novel fair regression and dimensionality reduction methods built on a previously proposed fair classification framework. Both methods rely on using the Hilbert Schmidt independence criterion as the fairness term. Unlike previous approaches, this allows us to simplify the problem and to use multiple sensitive variables simultaneously. Replacing the linear formulation by kernel functions allows the methods to deal with nonlinear problems. For both linear and nonlinear formulations the solution reduces to solving simple matrix inversions or generalized eigenvalue problems. This simplifies the evaluation of the solutions for different trade-off values between the predictive error and fairness terms. We illustrate the usefulness of the proposed methods in toy examples, and evaluate their performance on real world datasets to predict income using gender and/or race discrimination as sensitive variables, and contraceptive method prediction under demographic and socio-economic sensitive descriptors.
MLBench: How Good Are Machine Learning Clouds for Binary Classification Tasks on Structured Data?
Liu, Yu, Zhang, Hantian, Zeng, Luyuan, Wu, Wentao, Zhang, Ce
We conduct an empirical study of machine learning functionalities provided by major cloud service providers, which we call machine learning clouds. Machine learning clouds hold the promise of hiding all the sophistication of running large-scale machine learning: Instead of specifying how to run a machine learning task, users only specify what machine learning task to run and the cloud figures out the rest. Raising the level of abstraction, however, rarely comes free -- a performance penalty is possible. How good, then, are current machine learning clouds on real-world machine learning workloads? We study this question with a focus on binary classification problems. We present mlbench, a novel benchmark constructed by harvesting datasets from Kaggle competitions. We then compare the performance of the top winning code available from Kaggle with that of running machine learning clouds from both Azure and Amazon on mlbench. Our comparative study reveals the strength and weakness of existing machine learning clouds and points out potential future directions for improvement.
L1 and L2 Regularization Methods – Towards Data Science – Medium
In my last post, I covered the introduction to Regularization in supervised learning models. In this post, let's go over some of the regularization techniques widely used and the key difference between those. A regression model that uses L1 regularization technique is called Lasso Regression and model which uses L2 is called Ridge Regression. The key difference between these two is the penalty term. Ridge regression adds "squared magnitude" of coefficient as penalty term to the loss function.
Facial Keypoints Detection
Detect facial keypoints is a critical element in face recognition. However, there is difficulty to catch keypoints on the face due to complex influences from original images, and there is no guidance to suitable algorithms. In this paper, we study different algorithms that can be applied to locate keyponits. Specifically: our framework (1)prepare the data for further investigation (2)Using PCA and LBP to process the data (3) Apply different algorithms to analysis data, including linear regression models, tree based model, neural network and convolutional neural network, etc. Finally we will give our conclusion and further research topic. A comprehensive set of experiments on dataset demonstrates the effectiveness of our framework.