AITopics | Ning, Yang

Collaborating Authors

Ning, Yang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

High-Dimensional Inference for Cluster-Based Graphical Models

Eisenach, Carson, Bunea, Florentina, Ning, Yang, Dinicu, Claudiu

arXiv.org Machine LearningJun-13-2018

Motivated by modern applications in which one constructs graphical models based on a very large number of features, this paper introduces a new class of cluster-based graphical models. Unlike standard graphical models, variable clustering is applied as an initial step for reducing the dimension of the feature space. We employ model assisted clustering, in which the clusters contain features that are similar to the same unobserved latent variable. Two different cluster-based Gaussian graphical models are considered: the latent variable graph, corresponding to the graphical model associated with the unobserved latent variables, and the cluster-average graph, corresponding to the vector of features averaged over clusters. We derive estimates tailored to these graphs, with the goal of pattern recovery under false discovery rate (FDR) control. Our study reveals that likelihood based inference for the latent graph is analytically intractable, and we develop alternative estimation and inference strategies. We replace the likelihood of the data by appropriate empirical risk functions that allow for valid inference in both graphical models under study. Our main results are Berry-Esseen central limit theorems for the proposed estimators, which are proved under weaker assumptions than those employed in the existing literature on Gaussian graphical model inference. We make explicit the implications of the asymptotic approximations on graph recovery under FDR control, and show when it can be controlled asymptotically. Our analysis takes into account the uncertainty induced by the initial clustering step. We find that the errors induced by clustering are asymptotically ignorable in the follow-up analysis, under no further restrictions on the parameter space for which inference is valid. The theoretical properties of the proposed procedures are verified on simulated data and an fMRI data analysis.

artificial intelligence, estimator, health & medicine, (19 more...)

arXiv.org Machine Learning

1806.05139

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.87)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Adaptive Estimation in Structured Factor Models with Applications to Overlapping Clustering

Bing, Xin, Bunea, Florentina, Ning, Yang, Wegkamp, Marten

arXiv.org Machine LearningMar-21-2018

This work introduces a novel estimation method, called LOVE, of the entries and structure of a loading matrix A in a sparse latent factor model X = AZ + E, for an observable random vector X in Rp, with correlated unobservable factors Z \in RK, with K unknown, and independent noise E. Each row of A is scaled and sparse. In order to identify the loading matrix A, we require the existence of pure variables, which are components of X that are associated, via A, with one and only one latent factor. Despite the fact that the number of factors K, the number of the pure variables, and their location are all unknown, we only require a mild condition on the covariance matrix of Z, and a minimum of only two pure variables per latent factor to show that A is uniquely defined, up to signed permutations. Our proofs for model identifiability are constructive, and lead to our novel estimation method of the number of factors and of the set of pure variables, from a sample of size n of observations on X. This is the first step of our LOVE algorithm, which is optimization-free, and has low computational complexity of order p2. The second step of LOVE is an easily implementable linear program that estimates A. We prove that the resulting estimator is minimax rate optimal up to logarithmic factors in p. The model structure is motivated by the problem of overlapping variable clustering, ubiquitous in data science. We define the population level clusters as groups of those components of X that are associated, via the sparse matrix A, with the same unobservable latent factor, and multi-factor association is allowed. Clusters are respectively anchored by the pure variables, and form overlapping sub-groups of the p-dimensional random vector X. The Latent model approach to OVErlapping clustering is reflected in the name of our algorithm, LOVE.

health & medicine, matrix, optimization problem, (21 more...)

arXiv.org Machine Learning

1704.06977

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre:

Research Report (0.64)
Workflow (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

Add feedback

A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations

Neykov, Matey, Ning, Yang, Liu, Jun S., Liu, Han

arXiv.org Machine LearningJun-22-2016

We propose a new inferential framework for constructing confidence regions and testing hypotheses in statistical models specified by a system of high dimensional estimating equations. We construct an influence function by projecting the fitted estimating equations to a sparse direction obtained by solving a large-scale linear program. Our main theoretical contribution is to establish a unified Z-estimation theory of confidence regions for high dimensional problems. Different from existing methods, all of which require the specification of the likelihood or pseudo-likelihood, our framework is likelihood-free. As a result, our approach provides valid inference for a broad class of high dimensional constrained estimating equation problems, which are not covered by existing methods. Such examples include, noisy compressed sensing, instrumental variable regression, undirected graphical models, discriminant analysis and vector autoregressive models. We present detailed theoretical results for all these examples. Finally, we conduct thorough numerical simulations, and a real dataset analysis to back up the developed theoretical results.

artificial intelligence, assumption, health & medicine, (17 more...)

arXiv.org Machine Learning

1510.08986

Country: North America > United States (0.92)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality

Wang, Zhaoran, Gu, Quanquan, Ning, Yang, Liu, Han

Neural Information Processing SystemsDec-31-2015

We provide a general theory of the expectation-maximization (EM) algorithm for inferring high dimensional latent variable models. In particular, we make two contributions: (i) For parameter estimation, we propose a novel high dimensional EM algorithm which naturally incorporates sparsity structure into parameter estimation. With an appropriate initialization, this algorithm converges at a geometric rate and attains an estimator with the (near-)optimal statistical rate of convergence. (ii) Based on the obtained estimator, we propose a new inferential procedure for testing hypotheses for low dimensional components of high dimensional parameters. For a broad family of statistical models, our framework establishes the first computationally feasible approach for optimal estimation and asymptotic inference in high dimensions.

artificial intelligence, em algorithm, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Likelihood Ratio Framework for High Dimensional Semiparametric Regression

Ning, Yang, Zhao, Tianqi, Liu, Han

arXiv.org Machine LearningNov-22-2015

We propose a likelihood ratio based inferential framework for high dimensional semiparametric generalized linear models. This framework addresses a variety of challenging problems in high dimensional data analysis, including incomplete data, selection bias, and heterogeneous multitask learning. Our work has three main contributions. (i) We develop a regularized statistical chromatography approach to infer the parameter of interest under the proposed semiparametric generalized linear model without the need of estimating the unknown base measure function. (ii) We propose a new framework to construct post-regularization confidence regions and tests for the low dimensional components of high dimensional parameters. Unlike existing post-regularization inferential methods, our approach is based on a novel directional likelihood. In particular, the framework naturally handles generic regularized estimators with nonconvex penalty functions and it can be used to infer least false parameters under misspecified models. (iii) We develop new concentration inequalities and normal approximation results for U-statistics with unbounded kernels, which are of independent interest. We demonstrate the consequences of the general theory by using an example of missing data problem. Extensive simulation studies and real data analysis are provided to illustrate our proposed approach.

exp, health & medicine, oncology, (18 more...)

arXiv.org Machine Learning

1412.2295

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

On Semiparametric Exponential Family Graphical Models

Yang, Zhuoran, Ning, Yang, Liu, Han

arXiv.org Machine LearningOct-15-2015

We propose a new class of semiparametric exponential family graphical models for the analysis of high dimensional mixed data. Different from the existing mixed graphical models, we allow the nodewise conditional distributions to be semiparametric generalized linear models with unspecified base measure functions. Thus, one advantage of our method is that it is unnecessary to specify the type of each node and the method is more convenient to apply in practice. Under the proposed model, we consider both problems of parameter estimation and hypothesis testing in high dimensions. In particular, we propose a symmetric pairwise score test for the presence of a single edge in the graph. Compared to the existing methods for hypothesis tests, our approach takes into account of the symmetry of the parameters, such that the inferential results are invariant with respect to the different parametrizations of the same edge. Thorough numerical simulations and a real data example are provided to back up our results.

artificial intelligence, machine learning, semiparametric exponential family graphical model

arXiv.org Machine Learning

1412.8697

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages (0.80)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Local and Global Inference for High Dimensional Nonparanormal Graphical Models

Gu, Quanquan, Cao, Yuan, Ning, Yang, Liu, Han

arXiv.org Machine LearningJun-30-2015

This paper proposes a unified framework to quantify local and global inferential uncertainty for high dimensional nonparanormal graphical models. In particular, we consider the problems of testing the presence of a single edge and constructing a uniform confidence subgraph. Due to the presence of unknown marginal transformations, we propose a pseudo likelihood based inferential approach. In sharp contrast to the existing high dimensional score test method, our method is free of tuning parameters given an initial estimator, and extends the scope of the existing likelihood based inferential framework. Furthermore, we propose a U-statistic multiplier bootstrap method to construct the confidence subgraph. We show that the constructed subgraph is contained in the true graph with probability greater than a given nominal level. Compared with existing methods for constructing confidence subgraphs, our method does not rely on Gaussian or sub-Gaussian assumptions. The theoretical properties of the proposed inferential methods are verified by thorough numerical experiments and real data analysis.

artificial intelligence, assumption 4, health & medicine, (15 more...)

arXiv.org Machine Learning

1502.02347

Country: North America > United States > Virginia > Albemarle County > Charlottesville (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.66)

Add feedback

High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality

Wang, Zhaoran, Gu, Quanquan, Ning, Yang, Liu, Han

arXiv.org Machine LearningJan-26-2015

We provide a general theory of the expectation-maximization (EM) algorithm for inferring high dimensional latent variable models. In particular, we make two contributions: (i) For parameter estimation, we propose a novel high dimensional EM algorithm which naturally incorporates sparsity structure into parameter estimation. With an appropriate initialization, this algorithm converges at a geometric rate and attains an estimator with the (near-)optimal statistical rate of convergence. (ii) Based on the obtained estimator, we propose new inferential procedures for testing hypotheses and constructing confidence intervals for low dimensional components of high dimensional parameters. For a broad family of statistical models, our framework establishes the first computationally feasible approach for optimal estimation and asymptotic inference in high dimensions. Our theory is supported by thorough numerical results.

artificial intelligence, inequality, machine learning, (15 more...)

arXiv.org Machine Learning

1412.8729

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models

Ning, Yang, Liu, Han

arXiv.org Machine LearningJan-21-2015

We consider the problem of uncertainty assessment for low dimensional components in high dimensional models. Specifically, we propose a decorrelated score function to handle the impact of high dimensional nuisance parameters. We consider both hypothesis tests and confidence regions for generic penalized M-estimators. Unlike most existing inferential methods which are tailored for individual models, our approach provides a general framework for high dimensional inference and is applicable to a wide range of applications. From the testing perspective, we develop general theorems to characterize the limiting distributions of the decorrelated score test statistic under both null hypothesis and local alternatives. These results provide asymptotic guarantees on the type I errors and local powers of the proposed test. Furthermore, we show that the decorrelated score function can be used to construct point and confidence region estimators that are semiparametrically efficient. We also generalize this framework to broaden its applications. First, we extend it to handle high dimensional null hypothesis, where the number of parameters of interest can increase exponentially fast with the sample size. Second, we establish the theory for model misspecification. Third, we go beyond the likelihood framework, by introducing the generalized score test based on general loss functions. Thorough numerical studies are conducted to back up the developed theoretical results.

artificial intelligence, assumption 3, machine learning, (17 more...)

arXiv.org Machine Learning

1412.8765

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Testing and Confidence Intervals for High Dimensional Proportional Hazards Model

Fang, Ethan X., Ning, Yang, Liu, Han

arXiv.org Machine LearningDec-16-2014

This paper proposes a decorrelation-based approach to test hypotheses and construct confidence intervals for the low dimensional component of high dimensional proportional hazards models. Motivated by the geometric projection principle, we propose new decorrelated score, Wald and partial likelihood ratio statistics. Without assuming model selection consistency, we prove the asymptotic normality of these test statistics, establish their semiparametric optimality. We also develop new procedures for constructing pointwise confidence intervals for the baseline hazard function and baseline survival function. Thorough numerical results are provided to back up our theory.

artificial intelligence, assumption 2, health & medicine, (13 more...)

arXiv.org Machine Learning

1412.5158

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback