Qin, Yichen
Sparsified Simultaneous Confidence Intervals for High-Dimensional Linear Models
Zhu, Xiaorui, Qin, Yichen, Wang, Peng
High-dimensional data analysis plays an important role in modern scientific discoveries. There has been extensive work on high-dimensional variable selection and estimation using penalized regressions, such as Lasso (Tibshirani, 1996), SCAD (Fan and Li, 2001), MCP (Zhang et al., 2010), and selection by partitioning solution paths (Liu and Wang, 2018). In recent years, inference for the true regression coefficients and the true model began to attract attention. A major challenge of high-dimensional inference is how to quantify the uncertainty of the coefficient estimate because such uncertainty depends on two components, the uncertainty in parameter estimation given the selected model, the uncertainty in selecting the model, both of which are difficult to estimate and are actively studied. For inference of the regression coefficients, Scheffรฉ (1953) introduces the notion of simultaneous confidence intervals, which is a sequence of intervals containing the true coefficients at a given probability. For the high-dimensional linear models, Dezeure et al. (2017) and Zhang and Cheng (2017) construct the simultaneous confidence intervals using the debiased Lasso approach (van de Geer et al., 2014; Zhang and Zhang, 2014).
Statistical inference on random dot product graphs: a survey
Athreya, Avanti, Fishkind, Donniell E., Levin, Keith, Lyzinski, Vince, Park, Youngser, Qin, Yichen, Sussman, Daniel L., Tang, Minh, Vogelstein, Joshua T., Priebe, Carey E.
The random dot product graph (RDPG) is an independent-edge random graph that is analytically tractable and, simultaneously, either encompasses or can successfully approximate a wide range of random graphs, from relatively simple stochastic block models to complex latent position graphs. In this survey paper, we describe a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices. We examine the analogues, in graph inference, of several canonical tenets of classical Euclidean inference: in particular, we summarize a body of existing results on the consistency and asymptotic normality of the adjacency and Laplacian spectral embeddings, and the role these spectral embeddings can play in the construction of single- and multi-sample hypothesis tests for graph data. We investigate several real-world applications, including community detection and classification in large social networks and the determination of functional and biologically relevant network properties from an exploratory data analysis of the Drosophila connectome. We outline requisite background and current open problems in spectral graph inference.
Semiparametric spectral modeling of the Drosophila connectome
Priebe, Carey E., Park, Youngser, Tang, Minh, Athreya, Avanti, Lyzinski, Vince, Vogelstein, Joshua T., Qin, Yichen, Cocanougher, Ben, Eichler, Katharina, Zlatic, Marta, Cardona, Albert
We present semiparametric spectral modeling of the complete larval Drosophila mushroom body connectome. Motivated by a thorough exploratory data analysis of the network via Gaussian mixture modeling (GMM) in the adjacency spectral embedding (ASE) representation space, we introduce the latent structure model (LSM) for network modeling and inference. LSM is a generalization of the stochastic block model (SBM) and a special case of the random dot product graph (RDPG) latent position model, and is amenable to semiparametric GMM in the ASE representation space. The resulting connectome code derived via semiparametric GMM composed with ASE captures latent connectome structure and elucidates biologically relevant neuronal properties.