Goto

Collaborating Authors

 Cui, Elvis Han


A Semiparametric Bayesian Method for Instrumental Variable Analysis with Partly Interval-Censored Time-to-Event Outcome

arXiv.org Machine Learning

This paper develops a semiparametric Bayesian instrumental variable analysis method for estimating the causal effect of an endogenous variable when dealing with unobserved confounders and measurement errors with partly interval-censored time-to-event data, where event times are observed exactly for some subjects but left-censored, right-censored, or interval-censored for others. Our method is based on a two-stage Dirichlet process mixture instrumental variable (DPMIV) model which simultaneously models the first-stage random error term for the exposure variable and the second-stage random error term for the time-to-event outcome using a bivariate Gaussian mixture of the Dirichlet process (DPM) model. The DPM model can be broadly understood as a mixture model with an unspecified number of Gaussian components, which relaxes the normal error assumptions and allows the number of mixture components to be determined by the data. We develop an MCMC algorithm for the DPMIV model tailored for partly interval-censored data and conduct extensive simulations to assess the performance of our DPMIV method in comparison with some competing methods. Our simulations revealed that our proposed method is robust under different error distributions and can have superior performance over its parametric counterpart under various scenarios. We further demonstrate the effectiveness of our approach on an UK Biobank data to investigate the causal effect of systolic blood pressure on time-to-development of cardiovascular disease from the onset of diabetes mellitus.


A Metric-based Principal Curve Approach for Learning One-dimensional Manifold

arXiv.org Machine Learning

Principal curve is a well-known statistical method oriented in manifold learning using concepts from differential geometry. In this paper, we propose a novel metric-based principal curve (MPC) method that learns one-dimensional manifold of spatial data. Synthetic datasets Real applications using MNIST dataset show that our method can learn the one-dimensional manifold well in terms of the shape.


Metaheuristic Algorithms in Artificial Intelligence with Applications to Bioinformatics, Biostatistics, Ecology and, the Manufacturing Industries

arXiv.org Artificial Intelligence

Nature-inspired metaheuristic algorithms are important components of artificial intelligence, and are increasingly used across disciplines to tackle various types of challenging optimization problems. We apply a newly proposed nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA) and demonstrate its flexibility and out-performance relative to its competitors in a variety of optimization problems in the statistical sciences. In particular, we show the algorithm is efficient and can incorporate various cost structures or multiple user-specified nonlinear constraints. Our applications include (i) finding maximum likelihood estimates of parameters in a single cell generalized trend model to study pseudotime in bioinformatics, (ii) estimating parameters in a commonly used Rasch model in education research, (iii) finding M-estimates for a Cox regression in a Markov renewal model and (iv) matrix completion to impute missing values in a two compartment model. In addition we discuss applications to (v) select variables optimally in an ecology problem and (vi) design a car refueling experiment for the auto industry using a logistic model with multiple interacting factors.


Trajectory-aware Principal Manifold Framework for Data Augmentation and Image Generation

arXiv.org Artificial Intelligence

Data augmentation for deep learning benefits model training, image transformation, medical imaging analysis and many other fields. Many existing methods generate new samples from a parametric distribution, like the Gaussian, with little attention to generate samples along the data manifold in either the input or feature space. In this paper, we verify that there are theoretical and practical advantages of using the principal manifold hidden in the feature space than the Gaussian distribution. We then propose a novel trajectory-aware principal manifold framework to restore the manifold backbone and generate samples along a specific trajectory. On top of the autoencoder architecture, we further introduce an intrinsic dimension regularization term to make the manifold more compact and enable few-shot image generation. Experimental results show that the novel framework is able to extract more compact manifold representation, improve classification accuracy and generate smooth transformation among few samples.


A Roadmap to Asymptotic Properties with Applications to COVID-19 Data

arXiv.org Artificial Intelligence

A good estimator should, at least in the asymptotic sense, be close to the true quantity that it wishes to estimate and we should be able to give uncertainty measure based on a finite sample size. An estimator with well-behaved asymptotic properties can help clinicians in many ways such as reducing the number of patients needed in a trial, cutting down the budget for toxicology studies and providing insightful findings for late phase trials. Suggested by Sr. Fisher [1], generations of statisticians have worked on the so-called "consistency" and "asymptotic normality" of estimators. The former is based on different versions of law of large numbers (LLN) and the later is based on various types of central limit theorems (CLT) [2]. In addition to these two main tools, statisticians also apply other important but less well-known results in probability theory and other mathematical fields. To name a few, extreme value theory for distributions of maxima and minima [3], convex analysis for checking the optimality of a statistical design [4], asymptotic relative efficiency (ARE) of an estimator [5], concentration inequalities for finite sample properties and selection consistency [6] and other non-normal limits, robustness and simultaneous confidence bands of common statistical estimators [7, 8]. Despite of different properties, consistency and asymptotic normality are still the most celebrated and important properties of statistical estimators in either academia or industry. Hence, in the following, we present a roadmap to consistency and asymptotic normality. Then we provide their applications in toxicology studies and clinical trials using a COVID-19 dataset.