Balagopalan, Aparna
What's in a Query: Polarity-Aware Distribution-Based Fair Ranking
Balagopalan, Aparna, Wang, Kai, Salaudeen, Olawale, Biega, Asia, Ghassemi, Marzyeh
Machine learning-driven rankings, where individuals (or items) are ranked in response to a query, mediate search exposure or attention in a variety of safety-critical settings. Thus, it is important to ensure that such rankings are fair. Under the goal of equal opportunity, attention allocated to an individual on a ranking interface should be proportional to their relevance across search queries. In this work, we examine amortized fair ranking -- where relevance and attention are cumulated over a sequence of user queries to make fair ranking more feasible in practice. Unlike prior methods that operate on expected amortized attention for each individual, we define new divergence-based measures for attention distribution-based fairness in ranking (DistFaiR), characterizing unfairness as the divergence between the distribution of attention and relevance corresponding to an individual over time. This allows us to propose new definitions of unfairness, which are more reliable at test time. Second, we prove that group fairness is upper-bounded by individual fairness under this definition for a useful class of divergence measures, and experimentally show that maximizing individual fairness through an integer linear programming-based optimization is often beneficial to group fairness. Lastly, we find that prior research in amortized fair ranking ignores critical information about queries, potentially leading to a fairwashing risk in practice by making rankings appear more fair than they actually are.
Recent Advances, Applications, and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2023 Symposium
Jeong, Hyewon, Jabbour, Sarah, Yang, Yuzhe, Thapta, Rahul, Mozannar, Hussein, Han, William Jongwon, Mehandru, Nikita, Wornow, Michael, Lialin, Vladislav, Liu, Xin, Lozano, Alejandro, Zhu, Jiacheng, Kocielnik, Rafal Dariusz, Harrigian, Keith, Zhang, Haoran, Lee, Edward, Vukadinovic, Milos, Balagopalan, Aparna, Jeanselme, Vincent, Matton, Katherine, Demirel, Ilker, Fries, Jason, Rashidi, Parisa, Beaulieu-Jones, Brett, Xu, Xuhai Orson, McDermott, Matthew, Naumann, Tristan, Agrawal, Monica, Zitnik, Marinka, Ustun, Berk, Choi, Edward, Yeom, Kristen, Gursoy, Gamze, Ghassemi, Marzyeh, Pierson, Emma, Chen, George, Kanjilal, Sanjat, Oberst, Michael, Zhang, Linying, Singh, Harvineet, Hartvigsen, Tom, Zhou, Helen, Okolo, Chinasa T.
The third ML4H symposium was held in person on December 10, 2023, in New Orleans, Louisiana, USA. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the \ac{ML4H} community. Encouraged by the successful virtual roundtables in the previous year, we organized eleven in-person roundtables and four virtual roundtables at ML4H 2022. The organization of the research roundtables at the conference involved 17 Senior Chairs and 19 Junior Chairs across 11 tables. Each roundtable session included invited senior chairs (with substantial experience in the field), junior chairs (responsible for facilitating the discussion), and attendees from diverse backgrounds with interest in the session's topic. Herein we detail the organization process and compile takeaways from these roundtable discussions, including recent advances, applications, and open challenges for each topic. We conclude with a summary and lessons learned across all roundtables. This document serves as a comprehensive review paper, summarizing the recent advancements in machine learning for healthcare as contributed by foremost researchers in the field.
Event-Based Contrastive Learning for Medical Time Series
Jeong, Hyewon, Oufattole, Nassim, Mcdermott, Matthew, Balagopalan, Aparna, Jangeesingh, Bryan, Ghassemi, Marzyeh, Stultz, Collin
In clinical practice, one often needs to identify whether a patient is at high risk of adverse outcomes after some key medical event; for example, the short-term risk of death after an admission for heart failure. This task is challenging due to the complexity, variability, and heterogeneity of longitudinal medical data, especially for individuals suffering from chronic diseases like heart failure. In this paper, we introduce Event-Based Contrastive Learning (EBCL), a method for learning embeddings of heterogeneous patient data that preserves temporal information before and after key index events. We demonstrate that EBCL produces models that yield better fine-tuning performance on critical downstream tasks for a heart failure cohort, including 30-day readmission, 1-year mortality, and 1-week length of stay, relative to other pretraining methods. Our findings also reveal that EBCL pretraining alone can effectively cluster patients with similar mortality and readmission risks, offering valuable insights for clinical decision-making and personalized patient care.
The Role of Relevance in Fair Ranking
Balagopalan, Aparna, Jacobs, Abigail Z., Biega, Asia
Online platforms mediate access to opportunity: relevance-based rankings create and constrain options by allocating exposure to job openings and job candidates in hiring platforms, or sellers in a marketplace. In order to do so responsibly, these socially consequential systems employ various fairness measures and interventions, many of which seek to allocate exposure based on worthiness. Because these constructs are typically not directly observable, platforms must instead resort to using proxy scores such as relevance and infer them from behavioral signals such as searcher clicks. Yet, it remains an open question whether relevance fulfills its role as such a worthiness score in high-stakes fair rankings. In this paper, we combine perspectives and tools from the social sciences, information retrieval, and fairness in machine learning to derive a set of desired criteria that relevance scores should satisfy in order to meaningfully guide fairness interventions. We then empirically show that not all of these criteria are met in a case study of relevance inferred from biased user click data. We assess the impact of these violations on the estimated system fairness and analyze whether existing fairness interventions may mitigate the identified issues. Our analyses and results surface the pressing need for new approaches to relevance collection and generation that are suitable for use in fair ranking.
The Effect of Heterogeneous Data for Alzheimer's Disease Detection from Speech
Balagopalan, Aparna, Novikova, Jekaterina, Rudzicz, Frank, Ghassemi, Marzyeh
Speech datasets for identifying Alzheimer's disease (AD) are generally restricted to participants performing a single task, e.g. describing an image shown to them. As a result, models trained on linguistic features derived from such datasets may not be generalizable across tasks. Building on prior work demonstrating that same-task data of healthy participants helps improve AD detection on a single-task dataset of pathological speech, we augment an AD-specific dataset consisting of subjects describing a picture with multi-task healthy data. We demonstrate that normative data from multiple speech-based tasks helps improve AD detection by up to 9%. Visualization of decision boundaries reveals that models trained on a combination of structured picture descriptions and unstructured conversational speech have the least out-of-task error and show the most potential to generalize to multiple tasks. We analyze the impact of age of the added samples and if they affect fairness in classification. We also provide explanations for a possible inductive bias effect across tasks using model-agnostic feature anchors. This work highlights the need for heterogeneous datasets for encoding changes in multiple facets of cognition and for developing a task-independent AD detection model.
ReGAN: RE[LAX|BAR|INFORCE] based Sequence Generation using GANs
Balagopalan, Aparna, Gorti, Satya, Ravaut, Mathieu, Saqur, Raeid
Generative Adversarial Networks (GANs) have seen steep ascension to the peak of ML research zeitgeist in recent years. Mostly catalyzed by its success in the domain of image generation, the technique has seen wide range of adoption in a variety of other problem domains. Although GANs have had a lot of success in producing more realistic images than other approaches, they have only seen limited use for text sequences. Generation of longer sequences compounds this problem. Most recently, SeqGAN (Yu et al., 2017) has shown improvements in adversarial evaluation and results with human evaluation compared to a MLE based trained baseline. The main contributions of this paper are three-fold: 1. We show results for sequence generation using a GAN architecture with efficient policy gradient estimators, 2. We attain improved training stability, and 3. We perform a comparative study of recent unbiased low variance gradient estimation techniques such as REBAR (Tucker et al., 2017), RELAX (Grathwohl et al., 2018) and REINFORCE (Williams, 1992). Using a simple grammar on synthetic datasets with varying length, we indicate the quality of sequences generated by the model.