underrepresented group
Emergence of Hierarchical Emotion Organization in Large Language Models
Zhao, Bo, Okawa, Maya, Bigelow, Eric J., Yu, Rose, Ullman, Tomer, Lubana, Ekdeep Singh, Tanaka, Hidenori
As large language models (LLMs) increasingly power conversational agents, understanding how they model users' emotional states is critical for ethical deployment. Inspired by emotion wheels -- a psychological framework that argues emotions organize hierarchically -- we analyze probabilistic dependencies between emotional states in model outputs. We find that LLMs naturally form hierarchical emotion trees that align with human psychological models, and larger models develop more complex hierarchies. We also uncover systematic biases in emotion recognition across socioeconomic personas, with compounding misclassifications for intersectional, underrepresented groups. Human studies reveal striking parallels, suggesting that LLMs internalize aspects of social perception. Beyond highlighting emergent emotional reasoning in LLMs, our results hint at the potential of using cognitively-grounded theories for developing better model evaluations.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
- Education > Educational Setting > Higher Education (0.46)
Improving Equity in Health Modeling with GPT4-Turbo Generated Synthetic Data: A Comparative Study
Smolyak, Daniel, Welivita, Arshana, Bjarnadóttir, Margrét V., Agarwal, Ritu
Objective. Demographic groups are often represented at different rates in medical datasets. These differences can create bias in machine learning algorithms, with higher levels of performance for better-represented groups. One promising solution to this problem is to generate synthetic data to mitigate potential adverse effects of non-representative data sets. Methods. We build on recent advances in LLM-based synthetic data generation to create a pipeline where the synthetic data is generated separately for each demographic group. We conduct our study using MIMIC-IV and Framingham "Offspring and OMNI-1 Cohorts" datasets. We prompt GPT4-Turbo to create group-specific data, providing training examples and the dataset context. An exploratory analysis is conducted to ascertain the quality of the generated data. We then evaluate the utility of the synthetic data for augmentation of a training dataset in a downstream machine learning task, focusing specifically on model performance metrics across groups. Results. The performance of GPT4-Turbo augmentation is generally superior but not always. In the majority of experiments our method outperforms standard modeling baselines, however, prompting GPT-4-Turbo to produce data specific to a group provides little to no additional benefit over a prompt that does not specify the group. Conclusion. We developed a method for using LLMs out-of-the-box to synthesize group-specific data to address imbalances in demographic representation in medical datasets. As another "tool in the toolbox", this method can improve model fairness and thus health equity. More research is needed to understand the conditions under which LLM generated synthetic data is useful for non-representative medical data sets.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (2 more...)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
Debiasing Cardiac Imaging with Controlled Latent Diffusion Models
Skorupko, Grzegorz, Osuala, Richard, Szafranowska, Zuzanna, Kushibar, Kaisar, Aung, Nay, Petersen, Steffen E, Lekadir, Karim, Gkontra, Polyxeni
The progress in deep learning solutions for disease diagnosis and prognosis based on cardiac magnetic resonance imaging is hindered by highly imbalanced and biased training data. To address this issue, we propose a method to alleviate imbalances inherent in datasets through the generation of synthetic data based on sensitive attributes such as sex, age, body mass index, and health condition. We adopt ControlNet based on a denoising diffusion probabilistic model to condition on text assembled from patient metadata and cardiac geometry derived from segmentation masks using a large-cohort study, specifically, the UK Biobank. We assess our method by evaluating the realism of the generated images using established quantitative metrics. Furthermore, we conduct a downstream classification task aimed at debiasing a classifier by rectifying imbalances within underrepresented groups through synthetically generated samples. Our experiments demonstrate the effectiveness of the proposed approach in mitigating dataset imbalances, such as the scarcity of younger patients or individuals with normal BMI level suffering from heart failure. This work represents a major step towards the adoption of synthetic data for the development of fair and generalizable models for medical classification tasks. Notably, we conduct all our experiments using a single, consumer-level GPU to highlight the feasibility of our approach within resource-constrained environments.
- Europe > Switzerland (0.05)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Who Are We Missing? A Principled Approach to Characterizing the Underrepresented Population
Parikh, Harsh, Ross, Rachael, Stuart, Elizabeth, Rudolph, Kara
Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), to characterize underrepresented groups. ROOT optimizes the target subpopulation distribution by minimizing the variance of the target average treatment effect estimate, ensuring more precise treatment effect estimations. Notably, ROOT generates interpretable characteristics of the underrepresented population, aiding researchers in effective communication. Our approach demonstrates improved precision and interpretability compared to alternatives, as illustrated with synthetic data experiments. We apply our methodology to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- investigating the effectiveness of medication for opioid use disorder -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations.
- North America > United States > South Carolina (0.04)
- North America > United States > Oregon (0.04)
- North America > United States > District of Columbia (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Interactive robots as inclusive tools to increase diversity in higher education
There is a major lack of diversity in engineering, technology, and computing subjects in higher education. The resulting underrepresentation of some population groups contributes largely to gender and ethnicity pay gaps and social disadvantages. We aim to increase the diversity among students in such subjects by investigating the use of interactive robots as a tool that can get prospective students from different backgrounds interested in robotics as their field of study. For that, we will survey existing solutions that have proven to be successful in engaging underrepresented groups with technical subjects in educational settings. Moreover, we examine two recent outreach events at the University of Hertfordshire against inclusivity criteria. Based on that, we suggest specific activities for higher education institutions that follow an inclusive approach using interactive robots to attract prospective students at open days and other outreach events. Our suggestions provide tangible actions that can be easily implemented by higher education institutions to make technical subjects more appealing to everyone and thereby tackle inequalities in student uptake.
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Europe > United Kingdom > England > Hertfordshire > Hatfield (0.04)
- Europe > Norway (0.04)
- (3 more...)
- Research Report (1.00)
- Instructional Material (1.00)
Pushing the Accuracy-Group Robustness Frontier with Introspective Self-play
Liu, Jeremiah Zhe, Dvijotham, Krishnamurthy Dj, Lee, Jihyeon, Yuan, Quan, Strobel, Martin, Lakshminarayanan, Balaji, Ramachandran, Deepak
Standard empirical risk minimization (ERM) training can produce deep neural network (DNN) models that are accurate on average but under-perform in under-represented population subgroups, especially when there are imbalanced group distributions in the long-tailed training data. Therefore, approaches that improve the accuracy-group robustness trade-off frontier of a DNN model (i.e. improving worst-group accuracy without sacrificing average accuracy, or vice versa) is of crucial importance. Uncertainty-based active learning (AL) can potentially improve the frontier by preferentially sampling underrepresented subgroups to create a more balanced training dataset. However, the quality of uncertainty estimates from modern DNNs tend to degrade in the presence of spurious correlations and dataset bias, compromising the effectiveness of AL for sampling tail groups. In this work, we propose Introspective Self-play (ISP), a simple approach to improve the uncertainty estimation of a deep neural network under dataset bias, by adding an auxiliary introspection task requiring a model to predict the bias for each data point in addition to the label. We show that ISP provably improves the bias-awareness of the model representation and the resulting uncertainty estimates. On two real-world tabular and language tasks, ISP serves as a simple "plug-in" for AL model training, consistently improving both the tail-group sampling rate and the final accuracy-fairness trade-off frontier of popular AL methods.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- Research Report (1.00)
- Overview (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Why Technology Alone Can't Solve AI's Bias Problem - HBS Working Knowledge
In a cluttered online world, few can resist the convenience of an automated ranking when deciding what movie to watch on Netflix or which seafood restaurant looks promising in a Google search. But when it comes to finding a job candidate or someone to do a basic household task, there's often a human toll to letting algorithms do the work. Searches on popular recruiting sites might seem like a neutral way to find prospective candidates, but their underlying technology can reinforce biases by excluding underrepresented groups, including women. For instance, research shows that women receive fewer employment reviews on the popular online freelancing site TaskRabbit compared to men with the same experience--and this lack of reviews can lower the rankings of women in talent search algorithms. "Maybe there is a bias from people who have been traditionally hiring men," explains Himabindu Lakkaraju, an assistant professor at Harvard Business School.
MIT Schwarzman College of Computing unveils Break Through Tech AI
Aimed at driving diversity and inclusion in artificial intelligence, the MIT Stephen A. Schwarzman College of Computing is launching Break Through Tech AI, a new program to bridge the talent gap for women and underrepresented genders in AI positions in industry. Break Through Tech AI will provide skills-based training, industry-relevant portfolios, and mentoring to qualified undergraduate students in the Greater Boston area in order to position them more competitively for careers in data science, machine learning, and artificial intelligence. The free, 18-month program will also provide each student with a stipend for participation to lower the barrier for those typically unable to engage in an unpaid, extra-curricular educational opportunity. "Helping position students from diverse backgrounds to succeed in fields such as data science, machine learning, and artificial intelligence is critical for our society's future," says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and Henry Ellis Warren Professor of Electrical Engineering and Computer Science. "We look forward to working with students from across the Greater Boston area to provide them with skills and mentorship to help them find careers in this competitive and growing industry."
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)
- North America > United States > California > Los Angeles County > Los Angeles (0.07)
- North America > United States > New York (0.05)