Fraenkel, Ernest
Remote Inference of Cognitive Scores in ALS Patients Using a Picture Description
Agurto, Carla, Cecchi, Guillermo, Wen, Bo, Fraenkel, Ernest, Berry, James, Navar, Indu, Norel, Raquel
Amyotrophic lateral sclerosis is a fatal disease that not only affects movement, speech, and breath but also cognition. Recent studies have focused on the use of language analysis techniques to detect ALS and infer scales for monitoring functional progression. In this paper, we focused on another important aspect, cognitive impairment, which affects 35-50% of the ALS population. In an effort to reach the ALS population, which frequently exhibits mobility limitations, we implemented the digital version of the Edinburgh Cognitive and Behavioral ALS Screen (ECAS) test for the first time. This test which is designed to measure cognitive impairment was remotely performed by 56 participants from the EverythingALS Speech Study. As part of the study, participants (ALS and non-ALS) were asked to describe weekly one picture from a pool of many pictures with complex scenes displayed on their computer at home. We analyze the descriptions performed within +/- 60 days from the day the ECAS test was administered and extract different types of linguistic and acoustic features. We input those features into linear regression models to infer 5 ECAS sub-scores and the total score. Speech samples from the picture description are reliable enough to predict the ECAS subs-scores, achieving statistically significant Spearman correlation values between 0.32 and 0.51 for the model's performance using 10-fold cross-validation.
Adaptive Bias Correction for Improved Subseasonal Forecasting
Mouatadid, Soukayna, Orenstein, Paulo, Flaspohler, Genevieve, Cohen, Judah, Oprescu, Miruna, Fraenkel, Ernest, Mackey, Lester
Subseasonal forecasting -- predicting temperature and precipitation 2 to 6 weeks ahead -- is critical for effective water allocation, wildfire management, and drought and flood mitigation. Recent international research efforts have advanced the subseasonal capabilities of operational dynamical models, yet temperature and precipitation prediction skills remain poor, partly due to stubborn errors in representing atmospheric dynamics and physics inside dynamical models. Here, to counter these errors, we introduce an adaptive bias correction (ABC) method that combines state-of-the-art dynamical forecasts with observations using machine learning. We show that, when applied to the leading subseasonal model from the European Centre for Medium-Range Weather Forecasts (ECMWF), ABC improves temperature forecasting skill by 60-90% (over baseline skills of 0.18-0.25) and precipitation forecasting skill by 40-69% (over baseline skills of 0.11-0.15) in the contiguous U.S. We couple these performance improvements with a practical workflow to explain ABC skill gains and identify higher-skill windows of opportunity based on specific climate conditions.
Efficiently predicting high resolution mass spectra with graph neural networks
Murphy, Michael, Jegelka, Stefanie, Fraenkel, Ernest, Kind, Tobias, Healey, David, Butler, Thomas
The identification of unknown small molecules in complex chemical mixtures is a primary challenge in many areas of chemical and biological science. The standard high-throughput approach to small molecule identification is tandem mass spectrometry (MS/MS), with diverse applications including metabolomics [1], drug discovery [2], clinical diagnostics [3], forensics [4], and environmental monitoring [5]. The key bottleneck in MS/MS is structural elucidation: given a mass spectrum, we must determine the 2D structure of the molecule it represents. This problem is far from solved, and adversely impacts all areas of science that use MS/MS. Typically only 2 4% of spectra are identified in untargeted metabolomics experiments [6], and a recent competition saw no more than 30% accuracy [7]. Because MS/MS is a lossy measurement, and existing training sets are small, direct prediction of structures from spectra is particularly challenging. Therefore the most common approach is spectral library search, which casts the problem as information retrieval [8]: an observed spectrum is queried against a library of spectra with known structures. This provides an informative prior, and has the advantage of easy interpretability as the entire space of solutions is known.
Learned Benchmarks for Subseasonal Forecasting
Mouatadid, Soukayna, Orenstein, Paulo, Flaspohler, Genevieve, Oprescu, Miruna, Cohen, Judah, Wang, Franklyn, Knight, Sean, Geogdzhayeva, Maria, Levang, Sam, Fraenkel, Ernest, Mackey, Lester
We develop a subseasonal forecasting toolkit of simple learned benchmark models that outperform both operational practice and state-of-the-art machine learning and deep learning methods. Our new models include (a) Climatology++, an adaptive alternative to climatology that, for precipitation, is 9% more accurate and 250% more skillful than the United States operational Climate Forecasting System (CFSv2); (b) CFSv2++, a learned CFSv2 correction that improves temperature and precipitation accuracy by 7-8% and skill by 50-275%; and (c) Persistence++, an augmented persistence model that combines CFSv2 forecasts with lagged measurements to improve temperature and precipitation accuracy by 6-9% and skill by 40-130%. Across the contiguous U.S., our Climatology++, CFSv2++, and Persistence++ toolkit consistently outperforms standard meteorological baselines, state-of-the-art machine and deep learning methods, and the European Centre for Medium-Range Weather Forecasts ensemble. Overall, we find that augmenting traditional forecasting approaches with learned enhancements yields an effective and computationally inexpensive strategy for building the next generation of subseasonal forecasting benchmarks.
Graph-Sparse Logistic Regression
LeNail, Alexander, Schmidt, Ludwig, Li, Johnathan, Ehrenberger, Tobias, Sachs, Karen, Jegelka, Stefanie, Fraenkel, Ernest
We introduce Graph-Sparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val- idate this algorithm against synthetic data and benchmark it against L1-regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experimental code public and provide GSLR as an open source package.
Discovering Neuronal Cell Types and Their Gene Expression Profiles Using a Spatial Point Process Mixture Model
Huang, Furong, Anandkumar, Animashree, Borgs, Christian, Chayes, Jennifer, Fraenkel, Ernest, Hawrylycz, Michael, Lein, Ed, Ingrosso, Alessandro, Turaga, Srinivas
Cataloging the neuronal cell types that comprise circuitry of individual brain regions is a major goal of modern neuroscience and the BRAIN initiative. Single-cell RNA sequencing can now be used to measure the gene expression profiles of individual neurons and to categorize neurons based on their gene expression profiles. While the single-cell techniques are extremely powerful and hold great promise, they are currently still labor intensive, have a high cost per cell, and, most importantly, do not provide information on spatial distribution of cell types in specific regions of the brain. We propose a complementary approach that uses computational methods to infer the cell types and their gene expression profiles through analysis of brain-wide single-cell resolution in situ hybridization (ISH) imagery contained in the Allen Brain Atlas (ABA). We measure the spatial distribution of neurons labeled in the ISH image for each gene and model it as a spatial point process mixture, whose mixture weights are given by the cell types which express that gene. By fitting a point process mixture model jointly to the ISH images, we infer both the spatial point process distribution for each cell type and their gene expression profile. We validate our predictions of cell type-specific gene expression profiles using single cell RNA sequencing data, recently published for the mouse somatosensory cortex. Jointly with the gene expression profiles, cell features such as cell size, orientation, intensity and local density level are inferred per cell type.