Collaborating Authors



Not located in a thriving metropolis. It's located in rural, pastoral Strasburg, Pennsylvania, in the heart of Amish Country. The clinic has been the subject of many high-profile feature articles, of particular note, Scientific American's December, 2015 standout piece "Genomics for the People." Just this week, news that The Clinic for Special Children has made yet another breakthrough. Utilizing DNA sequencing and an analysis of genealogical records, some dating back nearly 100 years, the clinic's scientists confirmed the gene TNNT1 as the culprit in a devastating pediatric disorder known as "Amish nemaline myopathy."

Generalized Similarity U: A Non-parametric Test of Association Based on Similarity Machine Learning

Second generation sequencing technologies are being increasingly used for genetic association studies, where the main research interest is to identify sets of genetic variants that contribute to various phenotype. The phenotype can be univariate disease status, multivariate responses and even high-dimensional outcomes. Considering the genotype and phenotype as two complex objects, this also poses a general statistical problem of testing association between complex objects. We here proposed a similarity-based test, generalized similarity U (GSU), that can test the association between complex objects. We first studied the theoretical properties of the test in a general setting and then focused on the application of the test to sequencing association studies. Based on theoretical analysis, we proposed to use Laplacian kernel based similarity for GSU to boost power and enhance robustness. Through simulation, we found that GSU did have advantages over existing methods in terms of power and robustness. We further performed a whole genome sequencing (WGS) scan for Alzherimer Disease Neuroimaging Initiative (ADNI) data, identifying three genes, APOE, APOC1 and TOMM40, associated with imaging phenotype. We developed a C++ package for analysis of whole genome sequencing data using GSU. The source codes can be downloaded at

Before you send your spit to 23andMe, what you need to know

PBS NewsHour

The genetic testing company 23andMe received approval this week from regulators to sell genetic reports on an individual's risk for 10 diseases, most prominently Alzheimer's and Parkinson's. Before you send in your saliva sample and $199, here's what you should know: At most, that you carry a DNA variant that, according to research, is associated with a higher risk of a disease. For the rare clotting disorder hereditary thrombophilia, for instance, the report will say that you do or do not carry a variant called Factor V Leiden in the F5 gene and a variant called Prothrombin G20210A in the F2 gene. If there's enough science to quantify that, the report will specify a percentage, like "your risk is 3 percent." If not, it will just say there's an (unspecified) increased risk.

High-Order Multi-Task Feature Learning to Identify Longitudinal Phenotypic Markers for Alzheimer's Disease Progression Prediction

Neural Information Processing Systems

Alzheimer disease (AD) is a neurodegenerative disorder characterized by progressive impairment of memory and other cognitive functions. Regression analysis has been studied to relate neuroimaging measures to cognitive status. However, whether these measures have further predictive power to infer a trajectory of cognitive performance over time is still an under-explored but important topic in AD research. We propose a novel high-order multi-task learning model to address this issue. The proposed model explores the temporal correlations existing in data features and regression tasks by the structured sparsity-inducing norms.