Performance Analysis
Locating Hidden Exoplanets in ALMA Data Using Machine Learning
Terry, Jason, Hall, Cassandra, Abreau, Sean, Gleyzer, Sergei
Exoplanets in protoplanetary disks cause localized deviations from Keplerian velocity in channel maps of molecular line emission. Current methods of characterizing these deviations are time consuming, and there is no unified standard approach. We demonstrate that machine learning can quickly and accurately detect the presence of planets. We train our model on synthetic images generated from simulations and apply it to real observations to identify forming planets in real systems. Machine learning methods, based on computer vision, are not only capable of correctly identifying the presence of one or more planets, but they can also correctly constrain the location of those planets.
POPDx: An Automated Framework for Patient Phenotyping across 392,246 Individuals in the UK Biobank Study
Yang, Lu, Wang, Sheng, Altman, Russ B.
Objective For the UK Biobank standardized phenotype codes are associated with patients who have been hospitalized but are missing for many patients who have been treated exclusively in an outpatient setting. We describe a method for phenotype recognition that imputes phenotype codes for all UK Biobank participants. Materials and Methods POPDx (Population-based Objective Phenotyping by Deep Extrapolation) is a bilinear machine learning framework for simultaneously estimating the probabilities of 1,538 phenotype codes. We extracted phenotypic and health-related information of 392,246 individuals from the UK Biobank for POPDx development and evaluation. A total of 12,803 ICD-10 diagnosis codes of the patients were converted to 1,538 Phecodes as gold standard labels. The POPDx framework was evaluated and compared to other available methods on automated multi-phenotype recognition. Results POPDx can predict phenotypes that are rare or even unobserved in training. We demonstrate substantial improvement of automated multi-phenotype recognition across 22 disease categories, and its application in identifying key epidemiological features associated with each phenotype. Conclusions POPDx helps provide well-defined cohorts for downstream studies. It is a general purpose method that can be applied to other biobanks with diverse but incomplete data.
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired
Ohata, Kazuya, Kitada, Shunsuke, Iyatomi, Hitoshi
We propose a simple yet effective image captioning framework that can determine the quality of an image and notify the user of the reasons for any flaws in the image. Our framework first determines the quality of images and then generates captions using only those images that are determined to be of high quality. The user is notified by the flaws feature to retake if image quality is low, and this cycle is repeated until the input image is deemed to be of high quality. As a component of the framework, we trained and evaluated a low-quality image detection model that simultaneously learns difficulty in recognizing images and individual flaws, and we demonstrated that our proposal can explain the reasons for flaws with a sufficient score. We also evaluated a dataset with low-quality images removed by our framework and found improved values for all four common metrics (e.g., BLEU-4, METEOR, ROUGE-L, CIDEr), confirming an improvement in general-purpose image captioning capability. Our framework would assist the visually impaired, who have difficulty judging image quality.
Hey ASR System! Why Aren't You More Inclusive? Automatic Speech Recognition Systems' Bias and Proposed Bias Mitigation Techniques. A Literature Review
Ngueajio, Mikel K., Washington, Gloria
Speech is the fundamental means of communication between humans. The advent of AI and sophisticated speech technologies have led to the rapid proliferation of human-to-computer-based interactions, fueled primarily by Automatic Speech Recognition (ASR) systems. ASR systems normally take human speech in the form of audio and convert it into words, but for some users, it cannot decode the speech, and any output text is filled with errors that are incomprehensible to the human reader. These systems do not work equally for everyone and actually hinder the productivity of some users. In this paper, we present research that addresses ASR biases against gender, race, and the sick and disabled, while exploring studies that propose ASR debiasing techniques for mitigating these discriminations. We also discuss techniques for designing a more accessible and inclusive ASR technology. For each approach surveyed, we also provide a summary of the investigation and methods applied, the ASR systems and corpora used, and the research findings, and highlight their strengths and/or weaknesses. Finally, we propose future opportunities for Natural Language Processing researchers to explore in the next level creation of ASR technologies.
Multi-block Min-max Bilevel Optimization with Applications in Multi-task Deep AUC Maximization
Hu, Quanqi, Zhong, Yongjian, Yang, Tianbao
In this paper, we study multi-block min-max bilevel optimization problems, where the upper level is non-convex strongly-concave minimax objective and the lower level is a strongly convex objective, and there are multiple blocks of dual variables and lower level problems. Due to the intertwined multi-block min-max bilevel structure, the computational cost at each iteration could be prohibitively high, especially with a large number of blocks. To tackle this challenge, we present a single-loop randomized stochastic algorithm, which requires updates for only a constant number of blocks at each iteration. Under some mild assumptions on the problem, we establish its sample complexity of $O(1/\epsilon^4)$ for finding an $\epsilon$-stationary point. This matches the optimal complexity for solving stochastic nonconvex optimization under a general unbiased stochastic oracle model. Moreover, we provide two applications of the proposed method in multi-task deep AUC (area under ROC curve) maximization and multi-task deep partial AUC maximization. Experimental results validate our theory and demonstrate the effectiveness of our method on problems with hundreds of tasks.
Pi theorem formulation of flood mapping
Bartlett, Mark S., Van Blitterswyk, Jared, Farella, Martha, Li, Jinshu, Smith, Curtis, Mrad, Assaad
While physical phenomena are stated in terms of physical laws that are homogeneous in all dimensions, the mechanisms and patterns of the physical phenomena are independent of the form of the units describing the physical process. Accordingly, across different conditions, the similarity of a process may be captured through a dimensionless reformulation of the physical problem with Buckingham $\Pi$ theorem. Here, we apply Buckingham $\Pi$ theorem for creating dimensionless indices for capturing the similarity of the flood process, and in turn, these indices allow machine learning to map the likelihood of pluvial (flash) flooding over a landscape. In particular, we use these dimensionless predictors with a logistic regression machine learning (ML) model for a probabilistic determination of flood risk. The logistic regression derived flood maps compare well to 2D hydraulic model results that are the basis of the Federal Emergency Management Agency (FEMA) maps. As a result, the indices and logistic regression also provide the potential to expand existing FEMA maps to new (unmapped) areas and a wider spectrum of flood flows and precipitation events. Our results demonstrate that the new dimensionless indices capture the similarity of the flood process across different topographies and climate regions. Consequently, these dimensionless indices may expand observations of flooding (e.g., satellite) to the risk of flooding in new areas, as well as provide a basis for the rapid, real-time estimation of flood risk on a worldwide scale.
Auditing Algorithmic Fairness in Machine Learning for Health with Severity-Based LOGAN
Ovalle, Anaelia, Dev, Sunipa, Zhao, Jieyu, Sarrafzadeh, Majid, Chang, Kai-Wei
Auditing machine learning-based (ML) healthcare tools for bias is critical to preventing patient harm, especially in communities that disproportionately face health inequities. General frameworks are becoming increasingly available to measure ML fairness gaps between groups. However, ML for health (ML4H) auditing principles call for a contextual, patient-centered approach to model assessment. Therefore, ML auditing tools must be (1) better aligned with ML4H auditing principles and (2) able to illuminate and characterize communities vulnerable to the most harm. To address this gap, we propose supplementing ML4H auditing frameworks with SLOGAN (patient Severity-based LOcal Group biAs detectioN), an automatic tool for capturing local biases in a clinical prediction task. SLOGAN adapts an existing tool, LOGAN (LOcal Group biAs detectioN), by contextualizing group bias detection in patient illness severity and past medical history. We investigate and compare SLOGAN's bias detection capabilities to LOGAN and other clustering techniques across patient subgroups in the MIMIC-III dataset. On average, SLOGAN identifies larger fairness disparities in over 75% of patient groups than LOGAN while maintaining clustering quality. Furthermore, in a diabetes case study, health disparity literature corroborates the characterizations of the most biased clusters identified by SLOGAN. Our results contribute to the broader discussion of how machine learning biases may perpetuate existing healthcare disparities.
Testing for context-dependent changes in neural encoding in naturalistic experiments
Chen, Yenho, Harris, Carl W., Ma, Xiaoyu, Li, Zheng, Pereira, Francisco, Zheng, Charles Y.
We propose a decoding-based approach to detect context effects on neural codes in longitudinal neural recording data. The approach is agnostic to how information is encoded in neural activity, and can control for a variety of possible confounding factors present in the data. We demonstrate our approach by determining whether it is possible to decode location encoding from prefrontal cortex in the mouse and, further, testing whether the encoding changes due to task engagement.
SleepMore: Inferring Sleep Duration at Scale via Multi-Device WiFi Sensing
Zakaria, Camellia, Yilmaz, Gizem, Mammen, Priyanka, Chee, Michael, Shenoy, Prashant, Balan, Rajesh
The availability of commercial wearable trackers equipped with features to monitor sleep duration and quality has enabled more useful sleep health monitoring applications and analyses. However, much research has reported the challenge of long-term user retention in sleep monitoring through these modalities. Since modern Internet users own multiple mobile devices, our work explores the possibility of employing ubiquitous mobile devices and passive WiFi sensing techniques to predict sleep duration as the fundamental measure for complementing long-term sleep monitoring initiatives. In this paper, we propose SleepMore, an accurate and easy-to-deploy sleep-tracking approach based on machine learning over the user's WiFi network activity. It first employs a semi-personalized random forest model with an infinitesimal jackknife variance estimation method to classify a user's network activity behavior into sleep and awake states per minute granularity. Through a moving average technique, the system uses these state sequences to estimate the user's nocturnal sleep period and its uncertainty rate. Uncertainty quantification enables SleepMore to overcome the impact of noisy WiFi data that can yield large prediction errors. We validate SleepMore using data from a month-long user study involving 46 college students and draw comparisons with the Oura Ring wearable. Beyond the college campus, we evaluate SleepMore on non-student users of different housing profiles. Our results demonstrate that SleepMore produces statistically indistinguishable sleep statistics from the Oura ring baseline for predictions made within a 5% uncertainty rate. These errors range between 15-28 minutes for determining sleep time and 7-29 minutes for determining wake time, proving statistically significant improvements over prior work. Our in-depth analysis explains the sources of errors.
Can Strategic Data Collection Improve the Performance of Poverty Prediction Models?
Soman, Satej, Aiken, Emily, Rolf, Esther, Blumenstock, Joshua
Machine learning-based estimates of poverty and wealth are increasingly being used to guide the targeting of humanitarian aid and the allocation of social assistance. However, the ground truth labels used to train these models are typically borrowed from existing surveys that were designed to produce national statistics -- not to train machine learning models. Here, we test whether adaptive sampling strategies for ground truth data collection can improve the performance of poverty prediction models. Through simulations, we compare the status quo sampling strategies (uniform at random and stratified random sampling) to alternatives that prioritize acquiring training data based on model uncertainty or model performance on sub-populations. Perhaps surprisingly, we find that none of these active learning methods improve over uniform-at-random sampling. We discuss how these results can help shape future efforts to refine machine learning-based estimates of poverty.