group difference
Neural Responses to Affective Sentences Reveal Signatures of Depression
Kommineni, Aditya, Jeong, Woojae, Avramidis, Kleanthis, McDaniel, Colin, Hughes, Myzelle, McGee, Thomas, Kaiser, Elsi, Lerman, Kristina, Blank, Idan A., Byrd, Dani, Habibi, Assal, Cahn, B. Rael, Kadiri, Sudarsana, Medani, Takfarinas, Leahy, Richard M., Narayanan, Shrikanth
Depression is one of the most prevalent mental health disorders worldwide, with estimates indicating that around 5% of the worlds' adult population [1, 2] suffers from this condition. The primary methods for screening and monitoring depression rely on self-reported questionnaires, such as the Patient Health Questionnaire (PHQ-9) [3], Beck's Depression Inventory (BDI) [4] and Hamilton Depression Ratings Scale (HDRS) [5]. While these questionnaires are effective to varying degrees at screening patients for depression, they provide only limited information about the affected underlying neuro-cognitive processes in individuals, limiting the ability to personalize treatments. Given the heterogeneity of depressive symptomatology across patient populations [6, 7], it is crucial to elucidate the underlying neurophysiological mechanisms to support the development of more effective and individualized procedures for screening, monitoring, and treatment. Prior functional imaging studies have identified increased activity in anterior cin-gulate cortex (especially the subgenual anterior cingulate) during presentation of emotional stimuli, altered connectivity in prefrontal cortical areas, and default mode network as potential differentiating markers in depressed participants [8-13].
Copula-Linked Parallel ICA: A Method for Coupling Structural and Functional MRI brain Networks
Agcaoglu, Oktay, Silva, Rogers F., Alacam, Deniz, Plis, Sergey, Adali, Tulay, Calhoun, Vince
Different brain imaging modalities offer unique insights into brain function and structure. Combining them enhances our understanding of neural mechanisms. Prior multimodal studies fusing functional MRI (fMRI) and structural MRI (sMRI) have shown the benefits of this approach. Since sMRI lacks temporal data, existing fusion methods often compress fMRI temporal information into summary measures, sacrificing rich temporal dynamics. Motivated by the observation that covarying networks are identified in both sMRI and resting-state fMRI, we developed a novel fusion method, by combining deep learning frameworks, copulas and independent component analysis (ICA), named copula linked parallel ICA (CLiP-ICA). This method estimates independent sources for each modality and links the spatial sources of fMRI and sMRI using a copula-based model for more flexible integration of temporal and spatial data. We tested CLiP-ICA using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Our results showed that CLiP-ICA effectively captures both strongly and weakly linked sMRI and fMRI networks, including the cerebellum, sensorimotor, visual, cognitive control, and default mode networks. It revealed more meaningful components and fewer artifacts, addressing the long-standing issue of optimal model order in ICA. CLiP-ICA also detected complex functional connectivity patterns across stages of cognitive decline, with cognitively normal subjects generally showing higher connectivity in sensorimotor and visual networks compared to patients with Alzheimer, along with patterns suggesting potential compensatory mechanisms.
Whither Bias Goes, I Will Go: An Integrative, Systematic Review of Algorithmic Bias Mitigation
Hickman, Louis, Huynh, Christopher, Gass, Jessica, Booth, Brandon, Kuruzovich, Jason, Tay, Louis
Machine learning (ML) models are increasingly used for personnel assessment and selection (e.g., resume screeners, automatically scored interviews). However, concerns have been raised throughout society that ML assessments may be biased and perpetuate or exacerbate inequality. Although organizational researchers have begun investigating ML assessments from traditional psychometric and legal perspectives, there is a need to understand, clarify, and integrate fairness operationalizations and algorithmic bias mitigation methods from the computer science, data science, and organizational research literatures. We present a four-stage model of developing ML assessments and applying bias mitigation methods, including 1) generating the training data, 2) training the model, 3) testing the model, and 4) deploying the model. When introducing the four-stage model, we describe potential sources of bias and unfairness at each stage. Then, we systematically review definitions and operationalizations of algorithmic bias, legal requirements governing personnel selection from the United States and Europe, and research on algorithmic bias mitigation across multiple domains and integrate these findings into our framework. Our review provides insights for both research and practice by elucidating possible mechanisms of algorithmic bias while identifying which bias mitigation methods are legal and effective. This integrative framework also reveals gaps in the knowledge of algorithmic bias mitigation that should be addressed by future collaborative research between organizational researchers, computer scientists, and data scientists. We provide recommendations for developing and deploying ML assessments, as well as recommendations for future research into algorithmic bias and fairness.
To which reference class do you belong? Measuring racial fairness of reference classes with normative modeling
Rutherford, Saige, Wolfers, Thomas, Fraza, Charlotte, Harrnet, Nathaniel G., Beckmann, Christian F., Ruhe, Henricus G., Marquand, Andre F.
Reference classes in healthcare establish healthy norms, such as pediatric growth charts of height and weight, and are used to chart deviations from these norms which represent potential clinical risk. How the demographics of the reference class influence clinical interpretation of deviations is unknown. Using normative modeling, a method for building reference classes, we evaluate the fairness (racial bias) in reference models of structural brain images that are widely used in psychiatry and neurology. We test whether including race in the model creates fairer models. We predict self-reported race using the deviation scores from three different reference class normative models, to better understand bias in an integrated, multivariate sense. Across all of these tasks, we uncover racial disparities that are not easily addressed with existing data or commonly used modeling techniques. Our work suggests that deviations from the norm could be due to demographic mismatch with the reference class, and assigning clinical meaning to these deviations should be done with caution. Our approach also suggests that acquiring more representative samples is an urgent research priority.
A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds
Orlichenko, Anton, Qu, Gang, Zhou, Ziyu, Liu, Anqi, Deng, Hong-Wen, Ding, Zhengming, Stephen, Julia M., Wilson, Tony W., Calhoun, Vince D., Wang, Yu-Ping
Objective: fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized researchers, making dissemination of these valuable data sources challenging. Methods: We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics and generate high-quality synthetic fMRI data based on user-supplied demographics. We train and validate our model using two large, widely used datasets, the Philadelphia Neurodevelopmental Cohort (PNC) and Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP). Results: We find that DemoVAE recapitulates group differences in fMRI data while capturing the full breadth of individual variations. Significantly, we also find that most clinical and computerized battery fields that are correlated with fMRI data are not correlated with DemoVAE latents. An exception are several fields related to schizophrenia medication and symptom severity. Conclusion: Our model generates fMRI data that captures the full distribution of FC better than traditional VAE or GAN models. We also find that most prediction using fMRI data is dependent on correlation with, and prediction of, demographics. Significance: Our DemoVAE model allows for generation of high quality synthetic data conditioned on subject demographics as well as the removal of the confounding effects of demographics. We identify that FC-based prediction tasks are highly influenced by demographic confounds.
Wavelet based multi-scale shape features on arbitrary surfaces for cortical thickness discrimination Won Hwa Kim Charles Hatt
Hypothesis testing on signals defined on surfaces (such as the cortical surface) is a fundamental component of a variety of studies in Neuroscience. The goal here is to identify regions that exhibit changes as a function of the clinical condition under study. As the clinical questions of interest move towards identifying very early signs of diseases, the corresponding statistical differences at the group level invariably become weaker and increasingly hard to identify. Indeed, after a multiple comparisons correction is adopted (to account for correlated statistical tests over all surface points), very few regions may survive. In contrast to hypothesis tests on point-wise measurements, in this paper, we make the case for performing statistical analysis on multi-scale shape descriptors that characterize the local topological context of the signal around each surface vertex. Our descriptors are based on recent results from harmonic analysis, that show how wavelet theory extends to non-Euclidean settings (i.e., irregular weighted graphs). We provide strong evidence that these descriptors successfully pick up group-wise differences, where traditional methods either fail or yield unsatisfactory results. Other than this primary application, we show how the framework allows performing cortical surface smoothing in the native space without mappint to a unit sphere.
Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks
Sun, Huaman, Pei, Jiaxin, Choi, Minje, Jurgens, David
Human perception of language depends on personal backgrounds like gender and ethnicity. While existing studies have shown that large language models (LLMs) hold values that are closer to certain societal groups, it is unclear whether their prediction behaviors on subjective NLP tasks also exhibit a similar bias. In this study, leveraging the POPQUORN dataset which contains annotations of diverse demographic backgrounds, we conduct a series of experiments on four popular LLMs to investigate their capability to understand group differences and potential biases in their predictions for politeness and offensiveness. We find that for both tasks, model predictions are closer to the labels from White and female participants. We further explore prompting with the target demographic labels and show that including the target demographic in the prompt actually worsens the model's performance. More specifically, when being prompted to respond from the perspective of "Black" and "Asian" individuals, models show lower performance in predicting both overall scores as well as the scores from corresponding groups. Our results suggest that LLMs hold gender and racial biases for subjective NLP tasks and that demographic-infused prompts alone may be insufficient to mitigate such effects. Code and data are available at https://github.com/Jiaxin-Pei/LLM-Group-Bias.
Integrating Psychometrics and Computing Perspectives on Bias and Fairness in Affective Computing: A Case Study of Automated Video Interviews
Booth, Brandon M, Hickman, Louis, Subburaj, Shree Krishna, Tay, Louis, Woo, Sang Eun, DMello, Sidney K.
We provide a psychometric-grounded exposition of bias and fairness as applied to a typical machine learning pipeline for affective computing. We expand on an interpersonal communication framework to elucidate how to identify sources of bias that may arise in the process of inferring human emotions and other psychological constructs from observed behavior. Various methods and metrics for measuring fairness and bias are discussed along with pertinent implications within the United States legal context. We illustrate how to measure some types of bias and fairness in a case study involving automatic personality and hireability inference from multimodal data collected in video interviews for mock job applications. We encourage affective computing researchers and practitioners to encapsulate bias and fairness in their research processes and products and to consider their role, agency, and responsibility in promoting equitable and just systems. Personal use of this material is permitted. The tools used in affective computing (AC), which enable machines to identify people's behaviors and mental states, are being increasingly utilized in education, healthcare, and the workplace. One application is to aid in the allocation of limited resources (e.g., counseling, mental health care, in-person interviews) via automated screening [1-3]. In these types of high-stakes scenarios, the assessments provided by AC systems can directly affect the decision processes which influence the amount of attention, care, and opportunities afforded to individuals. As such, it is important that these processes are accurate, unbiased, and fair because any deficiencies or errors present in these systems stemming from the data they were trained on, the types of algorithms used, or the decision processes themselves, may disproportionately impact different groups of people and lead to ethical and legal concerns, not to mention pain and suffering for the vulnerable groups impacted. Simply put, AC systems must deter, not propagate, extant systems of inequity and injustice. Fortunately, we have decades of guidance on how to construct fair and unbiased measurement systems.