Goto

Collaborating Authors

 Regression


Nuclear Norm Regularized Estimation of Panel Regression Models

arXiv.org Machine Learning

In this paper we investigate panel regression models with interactive fixed effects. We propose two new estimation methods that are based on minimizing convex objective functions. The first method minimizes the sum of squared residuals with a nuclear (trace) norm regularization. The second method minimizes the nuclear norm of the residuals. We establish the consistency of the two resulting estimators. Those estimators have a very important computational advantage compared to the existing least squares (LS) estimator, in that they are defined as minimizers of a convex objective function. In addition, the nuclear norm penalization helps to resolve a potential identification problem for interactive fixed effect models, in particular when the regressors are low-rank and the number of the factors is unknown. We also show how to construct estimators that are asymptotically equivalent to the least squares (LS) estimator in Bai (2009) and Moon and Weidner (2017) by using our nuclear norm regularized or minimized estimators as initial values for a finite number of LS minimizing iteration steps. This iteration avoids any non-convex minimization, while the original LS estimation problem is generally non-convex, and can have multiple local minima.


Efficient Load Sampling for Worst-Case Structural Analysis Under Force Location Uncertainty

arXiv.org Machine Learning

An important task in structural design is to quantify the structural performance of an object under the external forces it may experience during its use. The problem proves to be computationally very challenging as the external forces' contact locations and magnitudes may exhibit significant variations. We present an efficient analysis approach to determine the most critical force contact location in such problems with force location uncertainty. Given an input 3D model and regions on its boundary where arbitrary normal forces may make contact, our algorithm predicts the worst-case force configuration responsible for creating the highest stress within the object. Our approach uses a computationally tractable experimental design method to select number of sample force locations based on geometry only, without inspecting the stress response that requires computationally expensive finite-element analysis. Then, we construct a simple regression model on these samples and corresponding maximum stresses. Combined with a simple ranking based post-processing step, our method provides a practical solution to worst-case structural analysis problem. The results indicate that our approach achieves significant improvements over the existing work and brute force approaches. We demonstrate that further speed- up can be obtained when small amount of an error tolerance in maximum stress is allowed.


The Skills That Data Analysts Need to Master - DZone Big Data

#artificialintelligence

This seems very simple, but, in fact, it's not. Excel can not only do simple two-dimensional tables, complex nested tables, but also create line charts, column charts, bar charts, area charts, pie charts, radar charts, combo charts, and scatter charts. Although you are a business analyst, if you can rely on IT and IT tools (such as a multi-dimensional BI analysis model) sometimes you can't get the data you want. These skills will definitely attract the attention of senior leaders, as it allows them to understand at a glance, and gain insight into, the essence of the business. Summary: At this point, if you've mastered 80% of the above skills you can be considered a qualified analyst.


Machine Learning for CEOs

#artificialintelligence

When I worked as a McKinsey consultant, I served the CEO of a bank regarding his small business strategy. I wanted to run regressions on the bank's data but I was advised against it: "They don't even understand statistics. How are you going to explain a regression to them?". CEOs have always needed to deeply understand human intelligence and emotion to manage enterprise teams. Now machines and algorithms are increasingly becoming part of these very teams.


Cadre Modeling: Simultaneously Discovering Subpopulations and Predictive Models

arXiv.org Machine Learning

We consider the problem in regression analysis of identifying subpopulations that exhibit different patterns of response, where each subpopulation requires a different underlying model. Unlike statistical cohorts, these subpopulations are not known a priori; thus, we refer to them as cadres. When the cadres and their associated models are interpretable, modeling leads to insights about the subpopulations and their associations with the regression target. We introduce a discriminative model that simultaneously learns cadre assignment and target-prediction rules. Sparsity-inducing priors are placed on the model parameters, under which independent feature selection is performed for both the cadre assignment and target-prediction processes. We learn models using adaptive step size stochastic gradient descent, and we assess cadre quality with bootstrapped sample analysis. We present simulated results showing that, when the true clustering rule does not depend on the entire set of features, our method significantly outperforms methods that learn subpopulation-discovery and target-prediction rules separately. In a materials-by-design case study, our model provides state-of-the-art prediction of polymer glass transition temperature. Importantly, the method identifies cadres of polymers that respond differently to structural perturbations, thus providing design insight for targeting or avoiding specific transition temperature ranges. It identifies chemically meaningful cadres, each with interpretable models. Further experimental results show that cadre methods have generalization that is competitive with linear and nonlinear regression models and can identify robust subpopulations.


Model Selection Techniques -- An Overview

arXiv.org Machine Learning

Abstract--In the era of "big data", analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-ofthe-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection. Vast development in hardware storage, precision instrument manufacture, economic globalization, etc. have generated huge volumes of data that can be analyzed to extract useful information. Typical statistical inference or machine learning procedures learn from and make predictions on data by fitting parametric or nonparametric models (in a broad sense). However, there exists no model that is universally suitable for any data and goal. This research was funded in part by the Defense Advanced Research Projects Agency (DARPA) under grant number W911NF-18-1-0134. J. Ding and Y. Yang are with the School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, United States. V. Tarokh is with the Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27708, United States. Therefore, a crucial step in a typical data analysis is to consider a set of candidate models (referred to as the model class), and then select the most appropriate one. In other words, model selection is the task of selecting a statistical model from a model class, given a set of data. There have been many overview papers on model selection scattered in the communities of signal processing [1], statistics [2], machine learning [3], epidemiology [4], chemometrics [5], ecology and evolution [6]. Despite the abundant literature on model selection, existing overviews usually focus on derivations, descriptions, or applications of particular model selection principles.


Ensemble Method for Censored Demand Prediction

arXiv.org Machine Learning

Many economic applications including optimal pricing and inventory management requires prediction of demand based on sales data and estimation of sales reaction to a price change. There is a wide range of econometric approaches which are used to correct a bias in estimates of demand parameters on censored sales data. These approaches can also be applied to various classes of machine learning models to reduce the prediction error of sales volume. In this study we construct two ensemble models for demand prediction with and without accounting for demand censorship. Accounting for sales censorship is based on the idea of censored quantile regression method where the model estimation is splitted on two separate parts: a) prediction of zero sales by classification model; and b) prediction of non-zero sales by regression model. Models with and without accounting for censorship are based on the predictions aggregations of Least squares, Ridge and Lasso regressions and Random Forest model. Having estimated the predictive properties of both models, we empirically test the best predictive power of the model that takes into account the censored nature of demand. We also show that machine learning method with censorship accounting provide bias corrected estimates of demand sensitivity for price change similar to econometric models.


Condition Number Analysis of Logistic Regression, and its Implications for Standard First-Order Solution Methods

arXiv.org Machine Learning

Logistic regression is one of the most popular methods in binary classification, wherein estimation of model parameters is carried out by solving the maximum likelihood (ML) optimization problem, and the ML estimator is defined to be the optimal solution of this problem. It is well known that the ML estimator exists when the data is non-separable, but fails to exist when the data is separable. First-order methods are the algorithms of choice for solving large-scale instances of the logistic regression problem. In this paper, we introduce a pair of condition numbers that measure the degree of non-separability or separability of a given dataset in the setting of binary classification, and we study how these condition numbers relate to and inform the properties and the convergence guarantees of first-order methods. When the training data is non-separable, we show that the degree of non-separability naturally enters the analysis and informs the properties and convergence guarantees of two standard first-order methods: steepest descent (for any given norm) and stochastic gradient descent. Expanding on the work of Bach, we also show how the degree of non-separability enters into the analysis of linear convergence of steepest descent (without needing strong convexity), as well as the adaptive convergence of stochastic gradient descent. When the training data is separable, first-order methods rather curiously have good empirical success, which is not well understood in theory. In the case of separable data, we demonstrate how the degree of separability enters into the analysis of $\ell_2$ steepest descent and stochastic gradient descent for delivering approximate-maximum-margin solutions with associated computational guarantees as well. This suggests that first-order methods can lead to statistically meaningful solutions in the separable case, even though the ML solution does not exist.


Quantile Regression Under Memory Constraint

arXiv.org Machine Learning

This paper studies the inference problem in quantile regression (QR) for a large sample size $n$ but under a limited memory constraint, where the memory can only store a small batch of data of size $m$. A natural method is the na\"ive divide-and-conquer approach, which splits data into batches of size $m$, computes the local QR estimator for each batch, and then aggregates the estimators via averaging. However, this method only works when $n=o(m^2)$ and is computationally expensive. This paper proposes a computationally efficient method, which only requires an initial QR estimator on a small batch of data and then successively refines the estimator via multiple rounds of aggregations. Theoretically, as long as $n$ grows polynomially in $m$, we establish the asymptotic normality for the obtained estimator and show that our estimator with only a few rounds of aggregations achieves the same efficiency as the QR estimator computed on all the data. Moreover, our result allows the case that the dimensionality $p$ goes to infinity. The proposed method can also be applied to address the QR problem under distributed computing environment (e.g., in a large-scale sensor network) or for real-time streaming data.


Prediction of Atomization Energy Using Graph Kernel and Active Learning

arXiv.org Machine Learning

Data-driven prediction of molecular properties presents unique challenges to the design of machine learning methods concerning data structure/dimensionality, symmetry adaption, and confidence management. In this paper, we present a kernel-based pipeline that can learn and predict the atomization energy of molecules with high accuracy. The framework employs Gaussian process regression to perform predictions based on the similarity between molecules, which is computed using the marginalized graph kernel. We discuss why the graph kernel, paired with a graph representation of the molecules, is particularly useful for predicting extensive properties. We demonstrate that using an active learning procedure, the proposed method can achieve a mean absolute error less than 1.0 kcal/mol on the QM7 data set using as few as 1200 training samples and 1 hour of training time. This is a demonstration, in contrast to common believes, that regression models based on kernel methods can be simultaneously accurate and fast predictors.