principal component regression
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.04)
- Europe > Spain > Basque Country (0.04)
Calibrated Principal Component Regression
Wu, Yixuan Florence, Zhu, Yilun, Cao, Lei, Shi, Naichen
We propose a new method for statistical inference in generalized linear models. In the overparameterized regime, Principal Component Regression (PCR) reduces variance by projecting high-dimensional data to a low-dimensional principal subspace before fitting. However, PCR incurs truncation bias whenever the true regression vector has mass outside the retained principal components (PC). To mitigate the bias, we propose Calibrated Principal Component Regression (CPCR), which first learns a low-variance prior in the PC subspace and then calibrates the model in the original feature space via a centered Tikhonov step. CPCR leverages cross-fitting and controls the truncation bias by softening PCR's hard cutoff. Theoretically, we calculate the out-of-sample risk in the random matrix regime, which shows that CPCR outperforms standard PCR when the regression signal has non-negligible components in low-variance directions. Empirically, CPCR consistently improves prediction across multiple overparameterized problems. The results highlight CPCR's stability and flexibility in modern overparameterized settings.
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Michigan (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan (0.04)
e96ed478dab8595a7dbda4cbcbee168f-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes a simple latent factor model for one-shot learning with continuous outputs where very few observations are available. Specifically, it derives risk approximations in an asymptotic regime where the number of training examples is fixed and the number of features in the X space diverges. Based on principal component regression (PCR) estimator, two estimators including the bias-corrected estimator and the so-called oracle estimator are proposed and the bounds for the risks of these estimators are derived. These bounds provide insights into the significance of various parameters relevant to one-shot learning.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Spain > Basque Country (0.04)
On Robustness of Principal Component Regression: Author Response
We begin by thanking all reviewers for their extremely encouraging and helpful responses. We agree that the fact we do PCR on both the training and testing covariates should be more explicitly placed in the context of transductive semi-supervised learning. We have strived to interpret our major theorem results (Thm 4.2 & Thm 5.1) by: (i) providing examples of natural generating Proposition 4.2, should be tight). Their empirical results support our theoretical guarantees.
Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model
In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are in two folds. First, in low dimensions and under a double asymptotic framework where both the dimension $d$ and sample size $n$ can increase, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation under the Gaussian model. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, $t$, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method.