Solving the Empirical Bayes Normal Means Problem with Correlated Noise

arXiv.org Machine Learning

The Normal Means problem plays a fundamental role in many areas of modern high-dimensional statistics, both in theory and practice. And the Empirical Bayes (EB) approach to solving this problem has been shown to be highly effective, again both in theory and practice. However, almost all EB treatments of the Normal Means problem assume that the observations are independent. In practice correlations are ubiquitous in real-world applications, and these correlations can grossly distort EB estimates. Here, exploiting theory from Schwartzman (2010), we develop new EB methods for solving the Normal Means problem that take account of unknown correlations among observations. We provide practical software implementations of these methods, and illustrate them in the context of large-scale multiple testing problems and False Discovery Rate (FDR) control. In realistic numerical experiments our methods compare favorably with other commonly-used multiple testing methods.


Bayesian Inference of Spreading Processes on Networks

arXiv.org Machine Learning

Infectious diseases are studied to understand their spreading mechanisms, to evaluate control strategies and to predict the risk and course of future outbreaks. Because people only interact with a small number of individuals, and because the structure of these interactions matters for spreading processes, the pairwise relationships between individuals in a population can be usefully represented by a network. Although the underlying processes of transmission are different, the network approach can be used to study the spread of pathogens in a contact network or the spread of rumors in an online social network. We study simulated simple and complex epidemics on synthetic networks and on two empirical networks, a social / contact network in an Indian village and an online social network in the U.S. Our goal is to learn simultaneously about the spreading process parameters and the source node (first infected node) of the epidemic, given a fixed and known network structure, and observations about state of nodes at several points in time. Our inference scheme is based on approximate Bayesian computation (ABC), an inference technique for complex models with likelihood functions that are either expensive to evaluate or analytically intractable. ABC enables us to adopt a Bayesian approach to the problem despite the posterior distribution being very complex. Our method is agnostic about the topology of the network and the nature of the spreading process. It generally performs well and, somewhat counter-intuitively, the inference problem appears to be easier on more heterogeneous network topologies, which enhances its future applicability to real-world settings where few networks have homogeneous topologies.


Click click snap: One look at patient's face, and AI can identify rare genetic diseases

#artificialintelligence

WASHINGTON D.C. [USA]: According to a recent study, a new artificial intelligence technology can accurately identify rare genetic disorders using a photograph of a patient's face. Named DeepGestalt, the AI technology outperformed clinicians in identifying a range of syndromes in three trials and could add value in personalised care, CNN reported. The study was published in the journal Nature Medicine. According to the study, eight per cent of the population has disease with key genetic components and many may have recognisable facial features. The study further adds that the technology could identify, for example, Angelman syndrome, a disorder affecting the nervous system with characteristic features such as a wide mouth with widely spaced teeth etc. Speaking about it, Yaron Gurovich, the chief technology officer at FDNA and lead researcher of the study said, "It demonstrates how one can successfully apply state of the art algorithms, such as deep learning, to a challenging field where the available data is small, unbalanced in terms of available patients per condition, and where the need to support a large amount of conditions is great."


A Bayesian Group Sparse Multi-Task Regression Model for Imaging Genetics

arXiv.org Machine Learning

Motivation: Recent advances in technology for brain imaging and high-throughput genotyping have motivated studies examining the influence of genetic variation on brain structure. Wang et al. (Bioinformatics, 2012) have developed an approach for the analysis of imaging genomic studies using penalized multi-task regression with regularization based on a novel group $l_{2,1}$-norm penalty which encourages structured sparsity at both the gene level and SNP level. While incorporating a number of useful features, the proposed method only furnishes a point estimate of the regression coefficients; techniques for conducting statistical inference are not provided. A new Bayesian method is proposed here to overcome this limitation. Results: We develop a Bayesian hierarchical modeling formulation where the posterior mode corresponds to the estimator proposed by Wang et al. (Bioinformatics, 2012), and an approach that allows for full posterior inference including the construction of interval estimates for the regression parameters. We show that the proposed hierarchical model can be expressed as a three-level Gaussian scale mixture and this representation facilitates the use of a Gibbs sampling algorithm for posterior simulation. Simulation studies demonstrate that the interval estimates obtained using our approach achieve adequate coverage probabilities that outperform those obtained from the nonparametric bootstrap. Our proposed methodology is applied to the analysis of neuroimaging and genetic data collected as part of the Alzheimer's Disease Neuroimaging Initiative (ADNI), and this analysis of the ADNI cohort demonstrates clearly the value added of incorporating interval estimation beyond only point estimation when relating SNPs to brain imaging endophenotypes.


Disease Progression Timeline Estimation for Alzheimer's Disease using Discriminative Event Based Modeling

arXiv.org Machine Learning

Alzheimer's Disease (AD) is characterized by a cascade of biomarkers becoming abnormal, the pathophysiology of which is very complex and largely unknown. Event-based modeling (EBM) is a data-driven technique to estimate the sequence in which biomarkers for a disease become abnormal based on cross-sectional data. It can help in understanding the dynamics of disease progression and facilitate early diagnosis and prognosis. In this work we propose a novel discriminative approach to EBM, which is shown to be more accurate than existing state-of-the-art EBM methods. The method first estimates for each subject an approximate ordering of events. Subsequently, the central ordering over all subjects is estimated by fitting a generalized Mallows model to these approximate subject-specific orderings. We also introduce the concept of relative distance between events which helps in creating a disease progression timeline. Subsequently, we propose a method to stage subjects by placing them on the estimated disease progression timeline. We evaluated the proposed method on Alzheimer's Disease Neuroimaging Initiative (ADNI) data and compared the results with existing state-of-the-art EBM methods. We also performed extensive experiments on synthetic data simulating the progression of Alzheimer's disease. The event orderings obtained on ADNI data seem plausible and are in agreement with the current understanding of progression of AD. The proposed patient staging algorithm performed consistently better than that of state-of-the-art EBM methods. Event orderings obtained in simulation experiments were more accurate than those of other EBM methods and the estimated disease progression timeline was observed to correlate with the timeline of actual disease progression. The results of these experiments are encouraging and suggest that discriminative EBM is a promising approach to disease progression modeling.