de-biased estimator
Optimal Bias-Correction and Valid Inference in High-Dimensional Ridge Regression: A Closed-Form Solution
It was first introduced to data analysis by Hoerl (1959) and later formulated in Hoerl and Kennard (1970b,a) for providing a robust solution to some of the persistent challenges encountered in traditional linear regression techniques; see Hoerl (1985) for a nice review. Emerging as a fundamental technique in predictive modeling, ridge regression addresses issues such as multicollinearity and overfitting, which commonly afflict predictive models dealing with high-dimensional data. Since its inception, ridge regression's practical adoption persists due to its superior performance over the least-squares estimator in various scenarios, evident in applications across neuroscience, chemistry, biology, and economics; see Leonard et al. (2023), Zahrt et al. (2019), Otwinowski and Plotkin (2014), Giannone et al. (2021), and Abadie and Kasy (2019), among others, underscoring its empirical effectiveness. From a shrinkage perspective, the ridge estimator also dominates the least-squares solutions in the sense that its mean-squared errors (MSEs) can be smaller, which provides a reasonable explanation on the empirical effectiveness of ridge estimators. See Theobald (1974), Athey and Imbens (2019), Hastie (2020), Hansen (2022a), and a comprehensive introduction to ridge regression in van Wieringen (2023). The ridge estimator offers a closed-form expression that simplifies both theoretical and empirical analyses. It aligns with the dense modeling techniques of Giannone et al. (2021), which acknowledge the potential significance of all explanatory variables for prediction. Empirical studies, such as those in Giannone et al. (2021), indicate that dense models generally tend to outperform the sparse ones in out-of-sample economic prediction performance. Similarly, Abadie and Kasy (2019) find that the ridge estimators dominate the lasso and the pre-testing estimators in terms of the risks when the effects of different predictors on the dependent variable are "smoothly distributed".
Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values.
Minimax Estimation of Linear Functions of Eigenvectors in the Face of Small Eigen-Gaps
Li, Gen, Cai, Changxiao, Gu, Yuantao, Poor, H. Vincent, Chen, Yuxin
Eigenvector perturbation analysis plays a vital role in various statistical data science applications. A large body of prior works, however, focused on establishing $\ell_{2}$ eigenvector perturbation bounds, which are often highly inadequate in addressing tasks that rely on fine-grained behavior of an eigenvector. This paper makes progress on this by studying the perturbation of linear functions of an unknown eigenvector. Focusing on two fundamental problems -- matrix denoising and principal component analysis -- in the presence of Gaussian noise, we develop a suite of statistical theory that characterizes the perturbation of arbitrary linear functions of an unknown eigenvector. In order to mitigate a non-negligible bias issue inherent to the natural "plug-in" estimator, we develop de-biased estimators that (1) achieve minimax lower bounds for a family of scenarios (modulo some logarithmic factor), and (2) can be computed in a data-driven manner without sample splitting. Noteworthily, the proposed estimators are nearly minimax optimal even when the associated eigen-gap is substantially smaller than what is required in prior theory.
Numerical comparisons between Bayesian and frequentist low-rank matrix completion: estimation accuracy and uncertainty quantification
In this paper we perform a numerious numerical studies for the problem of low-rank matrix completion. We compare the Bayesain approaches and a recently introduced de-biased estimator which provides a useful way to build confidence intervals of interest. From a theoretical viewpoint, the de-biased estimator comes with a sharp minimax-optinmal rate of estimation error whereas the Bayesian approach reaches this rate with an additional logarithmic factor. Our simulation studies show originally interesting results that the de-biased estimator is just as good as the Bayesain estimators. Moreover, Bayesian approaches are much more stable and can outperform the de-biased estimator in the case of small samples. However, we also find that the length of the confidence intervals revealed by the de-biased estimator for an entry is absolutely shorter than the length of the considered credible interval. These suggest further theoretical studies on the estimation error and the concentration for Bayesian methods as they are being quite limited up to present.
Inference and Uncertainty Quantification for Noisy Matrix Completion
Chen, Yuxin, Fan, Jianqing, Ma, Cong, Yan, Yuling
Noisy matrix completion aims at estimating a low-rank matrix given only partial and corrupted entries. Despite substantial progress in designing efficient estimation algorithms, it remains largely unclear how to assess the uncertainty of the obtained estimates and how to perform statistical inference on the unknown matrix (e.g.~constructing a valid and short confidence interval for an unseen entry). This paper takes a step towards inference and uncertainty quantification for noisy matrix completion. We develop a simple procedure to compensate for the bias of the widely used convex and nonconvex estimators. The resulting de-biased estimators admit nearly precise non-asymptotic distributional characterizations, which in turn enable optimal construction of confidence intervals\,/\,regions for, say, the missing entries and the low-rank factors. Our inferential procedures do not rely on sample splitting, thus avoiding unnecessary loss of data efficiency. As a byproduct, we obtain a sharp characterization of the estimation accuracy of our de-biased estimators, which, to the best of our knowledge, are the first tractable algorithms that provably achieve full statistical efficiency (including the preconstant). The analysis herein is built upon the intimate link between convex and nonconvex optimization --- an appealing feature recently discovered by \cite{chen2019noisy}.
Inter-Subject Analysis: Inferring Sparse Interactions with Dense Intra-Graphs
Ma, Cong, Lu, Junwei, Liu, Han
We develop a new modeling framework for Inter-Subject Analysis (ISA). The goal of ISA is to explore the dependency structure between different subjects with the intra-subject dependency as nuisance. It has important applications in neuroscience to explore the functional connectivity between brain regions under natural stimuli. Our framework is based on the Gaussian graphical models, under which ISA can be converted to the problem of estimation and inference of the inter-subject precision matrix. The main statistical challenge is that we do not impose sparsity constraint on the whole precision matrix and we only assume the inter-subject part is sparse. For estimation, we propose to estimate an alternative parameter to get around the non-sparse issue and it can achieve asymptotic consistency even if the intra-subject dependency is dense. For inference, we propose an "untangle and chord" procedure to de-bias our estimator. It is valid without the sparsity assumption on the inverse Hessian of the log-likelihood function. This inferential method is general and can be applied to many other statistical problems, thus it is of independent theoretical interest. Numerical experiments on both simulated and brain imaging data validate our methods and theory.