Regression
Homomorphic Sensing of Subspace Arrangements
Peng, Liangzu, Wang, Boshi, Tsakiris, Manolis C.
Homomorphic sensing is a recent algebraic-geometric framework that studies the unique recovery of points in a linear subspace from their images under a given collection of linear transformations. It has been successful in interpreting such a recovery in the case of permutations composed by coordinate projections, an important instance in applications known as unlabeled sensing, which models data that are out of order and have missing values. In this paper we make several fundamental contributions. First, we extend the homomorphic sensing framework from a single subspace to a subspace arrangement. Second, when specialized to a single subspace the new conditions are simpler and tighter. Third, as a natural consequence of our main theorem we obtain in a unified way recovery conditions for real phase retrieval, typically known via diverse techniques in the literature, as well as novel conditions for sparse and unsigned versions of linear regression without correspondences and unlabeled sensing. Finally, we prove that the homomorphic sensing property is locally stable to noise.
Geometric Interpretation of Linear Regression
Linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). In geometric interpretation terms, the linear regression algorithm tries to find a plane or line that best fits the data points as well as possible. Linear regression is a regression technique that predicts real value. What does the term "finding plane that best fits the data points" mean? For the above-given sample 2-dimension dataset (Image 1), the general equation of the line that covers as numbers of points as possible is y m*x c, where m is the slope of the line, and c is the intercept term.
CLAIMED: A CLAssification-Incorporated Minimum Energy Design to explore a multivariate response surface with feasibility constraints
Song, Yao, Sengul, Mert Y., He, Linglin, van Duin, Adri C. T., Hung, Ying, Dasgupta, Tirthankar
Motivated by the problem of optimization of force-field systems in physics using large-scale computer simulations, we consider exploration of a deterministic complex multivariate response surface. The objective is to find input combinations that generate output close to some desired or "target" vector. In spite of reducing the problem to exploration of the input space with respect to a one-dimensional loss function, the search is nontrivial and challenging due to infeasible input combinations, high dimensionalities of the input and output space and multiple "desirable" regions in the input space and the difficulty of emulating the objective function well with a surrogate model. We propose an approach that is based on combining machine learning techniques with smart experimental design ideas to locate multiple good regions in the input space.
A Semiparametric Approach to Interpretable Machine Learning
Sani, Numair, Lee, Jaron, Nabi, Razieh, Shpitser, Ilya
Black-box models in machine learning have demonstrated excellent predictive performance in complex problems and high-dimensional settings. However, their lack of transparency and interpretability restrict the applicability of such models in critical decision-making processes. In order to combat this shortcoming, we propose a novel approach to trading off interpretability and performance in prediction models using ideas from semiparametric statistics, allowing us to combine the interpretability of parametric regression models with performance of nonparametric methods. We achieve this by utilizing a two-piece model: the first piece is interpretable and parametric, to which a second, uninterpretable residual piece is added. The performance of the overall model is optimized using methods from the sufficient dimension reduction literature. Influence function based estimators are derived and shown to be doubly robust. This allows for use of approaches such as Double Machine Learning in estimating our model parameters. We illustrate the utility of our approach via simulation studies and a data application based on predicting the length of stay in the intensive care unit among surgery patients.
A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent
Liao, Zhenyu, Couillet, Romain, Mahoney, Michael W.
This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$, their dimension $p$, and the dimension of feature space $N$ are all large and comparable. In this regime, the random RFF Gram matrix no longer converges to the well-known limiting Gaussian kernel matrix (as it does when $N \to \infty$ alone), but it still has a tractable behavior that is captured by our analysis. This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$. Based on these estimates, a precise characterization of two qualitatively different phases of learning, including the phase transition between them, is provided; and the corresponding double descent test error curve is derived from this phase transition behavior. These results do not depend on strong assumptions on the data distribution, and they perfectly match empirical results on finite-dimensional real-world data sets.
The Penalty Imposed by Ablated Data Augmentation
Liu, Frederick, Najmi, Amir, Sundararajan, Mukund
There is a set of data augmentation techniques that ablate parts of the input at random. These include input dropout, cutout, and random erasing. We term these techniques ablated data augmentation. Though these techniques seems similar in spirit and have shown success in improving model performance in a variety of domains, we do not yet have a mathematical understanding of the differences between these techniques like we do for other regularization techniques like L1 or L2. First, we study a formal model of mean ablated data augmentation and inverted dropout for linear regression. We prove that ablated data augmentation is equivalent to optimizing the ordinary least squares objective along with a penalty that we call the Contribution Covariance Penalty and inverted dropout, a more common implementation than dropout in popular frameworks, is equivalent to optimizing the ordinary least squares objective along with Modified L2. For deep networks, we demonstrate an empirical version of the result if we replace contributions with attributions and coefficients with average gradients, i.e., the Contribution Covariance Penalty and Modified L2 Penalty drop with the increase of the corresponding ablated data augmentation across a variety of networks.
Sparse learning with CART
Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART methodology. In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a nonlinear optimization problem. We leverage this connection between the training error and Pearson correlation to show that CART with cost-complexity pruning achieves an optimal complexity/goodnessof-fit tradeoff when the depth scales with the logarithm of the sample size. Data dependent quantities, which adapt to the dimensionality and latent structure of the regression model, are seen to govern the rates of convergence of the prediction error.
The Expected Jacobian Outerproduct: Theory and Empirics
The expected gradient outerproduct (EGOP) of an unknown regression function is an operator that arises in the theory of multi-index regression, and is known to recover those directions that are most relevant to predicting the output. However, work on the EGOP, including that on its cheap estimators, is restricted to the regression setting. In this work, we adapt this operator to the multi-class setting, which we dub the expected Jacobian outerproduct (EJOP). Moreover, we propose a simple rough estimator of the EJOP and show that somewhat surprisingly, it remains statistically consistent under mild assumptions. Furthermore, we show that the eigenvalues and eigenspaces also remain consistent. Finally, we show that the estimated EJOP can be used as a metric to yield improvements in real-world non-parametric classification tasks: both by its use as a metric, and also as cheap initialization in metric learning tasks.
Think out of the package: Recommending package types for e-commerce shipments
Gurumoorthy, Karthik S., Sanyal, Subhajit, Chaoji, Vineet
Multiple product attributes like dimensions, weight, fragility, liquid content etc. determine the package type used by e-commerce companies to ship products. Sub-optimal package types lead to damaged shipments, incurring huge damage related costs and adversely impacting the company's reputation for safe delivery. Items can be shipped in more protective packages to reduce damage costs, however this increases the shipment costs due to expensive packaging and higher transportation costs. In this work, we propose a multi-stage approach that trades-off between shipment and damage costs for each product, and accurately assigns the optimal package type using a scalable, computationally efficient linear time algorithm. A simple binary search algorithm is presented to find the hyper-parameter that balances between the shipment and damage costs. Our approach when applied to choosing package type for Amazon shipments, leads to significant cost savings of tens of millions of dollars in emerging marketplaces, by decreasing both the overall shipment cost and the number of in-transit damages. Our algorithm is live and deployed in the production system where, package types for more than 130,000 products have been modified based on the model's recommendation, realizing a reduction in damage rate of 24%.
Median regression with differential privacy
Personal privacy information may be exposed with the unprecedented availability of datasets, so there is increasing requirement that statistical analysis of such datasets should protect individual privacy. As [6] describes, differential privacy addresses the paradox of learning nothing about an individual while learning useful information about a population. Over the past few years, differential privacy has been investigated in machine learning [1] and has been applied in the real world, see for example [8]. Recently, [3] formulates a general lower bound argument for minimax risks with differential privacy constraints, and applies this argument to high-dimensional mean estimation and linear regression problems. In this paper, three privacy preserving methods are proposed for median regression, which is a special case of quantile regression. Quantile regression was first introduced in [12], which aims to estimate and conduct inference about conditional quantile functions. In recent years, quantile regression has become a comprehensive method for statistical analysis of response models and it has been widely used in reality, such as survival analysis and economics, see for example, [14], [20] and [15]. The fact that the median regression takes least absolute deviation as its objective function to estimate parameters has been known among statisticians [12].