Goto

Collaborating Authors

 groupdro




FairTune: A Bias-Aware Fine-Tuning Framework Towards Fair Heart Rate Prediction from PPG

arXiv.org Artificial Intelligence

Foundation models pretrained on physiological data such as photoplethysmography (PPG) signals are increasingly used to improve heart rate (HR) prediction across diverse settings. Fine-tuning these models for local deployment is often seen as a practical and scalable strategy. However, its impact on demographic fairness--particularly under domain shifts--remains underexplored. We fine-tune PPG-GPT -- a transformer-based foundation model pretrained on intensive care unit (ICU) data -- across three heterogeneous datasets (ICU, wearable, smartphone) and systematically evaluate the effects on HR prediction accuracy and gender fairness. While fine-tuning substantially reduces mean absolute error (up to 80%), it can simultaneously widen fairness gaps, especially in larger models and under significant distributional characteristics shifts. To address this, we introduce FairTune, a bias-aware fine-tuning framework in which we benchmark three mitigation strategies: class weighting based on inverse group frequency (IF), Group Distributionally Robust Optimization (GroupDRO), and adversarial debiasing (ADV). We find that IF and GroupDRO significantly reduce fairness gaps without compromising accuracy, with effectiveness varying by deployment domain. Representation analyses further reveal that mitigation techniques reshape internal embeddings to reduce demographic clustering. Our findings highlight that fairness does not emerge as a natural byproduct of fine-tuning and that explicit mitigation is essential for equitable deployment of physiological foundation models.



MegaCOIN: Enhancing Medium-Grained Color Perception for Vision-Language Models

arXiv.org Artificial Intelligence

In vision-language models (VLMs), the ability to perceive and interpret color and physical environment is crucial for achieving contextually accurate understanding and interaction. However, despite advances in multimodal modeling, there remains a significant lack of specialized datasets that rigorously evaluate a model's capacity to discern subtle color variations and spatial context -- critical elements for situational comprehension and reliable deployment across real-world applications. Toward that goal, we curate MegaCOIN, a high-quality, human-labeled dataset based on \emph{real} images with various contextual attributes. MegaCOIN consists of two parts: MegaCOIN-Instruct, which serves as a supervised fine-tuning (SFT) dataset for VLMs; and MegaCOIN-Bench, an annotated test set that can be used as a stand-alone QA dataset. MegaCOIN~provides three annotated features for 220,000 real images: foreground color, background color, and description of an object's physical environment, constituting 660k human annotations. In addition, MegaCOIN can be applied to benchmark domain generalization (DG) algorithms. We explore benchmarking DG methods in the linear probing setup for VLM and show some new insights. Last but not least, we show that VLMs, including GPT-4o, have subpar color recognition capabilities, and fine-tuning with MegaCOIN can result in improved performance on visual evaluation tasks. In certain cases, MegaCOIN fine-tuned small-scale opensource models such as LLaVA and Bunny can outperform closed-source GPT-4o. We hope the utilities of MegaCOIN can shed light on the directions VLMs can improve and provide a more complex platform for domain generalization algorithms.


Curriculum-enhanced GroupDRO: Challenging the Norm of Avoiding Curriculum Learning in Subpopulation Shift Setups

arXiv.org Artificial Intelligence

In subpopulation shift scenarios, a Curriculum Learning (CL) approach would only serve to imprint the model weights, early on, with the easily learnable spurious correlations featured. To the best of our knowledge, none of the current state-of-the-art subpopulation shift approaches employ any kind of curriculum. To overcome this, we design a CL approach aimed at initializing the model weights in an unbiased vantage point in the hypothesis space which sabotages easy convergence towards biased hypotheses during the final optimization based on the entirety of the available data. We hereby propose a Curriculum-enhanced Group Distributionally Robust Optimization (CeGDRO) approach, which prioritizes the hardest bias-confirming samples and the easiest bias-conflicting samples, leveraging GroupDRO to balance the initial discrepancy in terms of difficulty. We benchmark our proposed method against the most popular subpopulation shift datasets, showing an increase over the state-of-the-art results across all scenarios, up to 6.2% on Waterbirds.


ConceptDrift: Uncovering Biases through the Lens of Foundation Models

arXiv.org Artificial Intelligence

An important goal of ML research is to identify and mitigate unwanted biases intrinsic to datasets and already incorporated into pre-trained models. Previous approaches have identified biases using highly curated validation subsets, that require human knowledge to create in the first place. This limits the ability to automate the discovery of unknown biases in new datasets. We solve this by using interpretable vision-language models, combined with a filtration method using LLMs and known concept hierarchies. More exactly, for a dataset, we use pre-trained CLIP models that have an associated embedding for each class and see how it drifts through learning towards embeddings that disclose hidden biases. We call this approach ConceptDrift and show that it can be scaled to automatically identify biases in datasets like ImageNet without human prior knowledge. We propose two bias identification evaluation protocols to fill the gap in the previous work and show that our method significantly improves over SoTA methods, both using our protocol and classical evaluations. Alongside validating the identified biases, we also show that they can be leveraged to improve the performance of different methods. Our method is not bounded to a single modality, and we empirically validate it both on image (Waterbirds, CelebA, ImageNet), and text datasets (CivilComments).


In-context Learning in Presence of Spurious Correlations

arXiv.org Artificial Intelligence

Large language models exhibit a remarkable capacity for in-context learning, where they learn to solve tasks given a few examples. Recent work has shown that transformers can be trained to perform simple regression tasks in-context. This work explores the possibility of training an in-context learner for classification tasks involving spurious features. We find that the conventional approach of training in-context learners is susceptible to spurious features. Moreover, when the meta-training dataset includes instances of only one task, the conventional approach leads to task memorization and fails to produce a model that leverages context for predictions. Based on these observations, we propose a novel technique to train such a learner for a given classification task. Remarkably, this in-context learner matches and sometimes outperforms strong methods like ERM and GroupDRO. However, unlike these algorithms, it does not generalize well to other tasks. We show that it is possible to obtain an in-context learner that generalizes to unseen tasks by training on a diverse dataset of synthetic in-context learning instances.


Improving Robustness to Multiple Spurious Correlations by Multi-Objective Optimization

arXiv.org Artificial Intelligence

We study the problem of training an unbiased and accurate model given a dataset with multiple biases. This problem is challenging since the multiple biases cause multiple undesirable shortcuts during training, and even worse, mitigating one may exacerbate the other. We propose a novel training method to tackle this challenge. Our method first groups training data so that different groups induce different shortcuts, and then optimizes a linear combination of group-wise losses while adjusting their weights dynamically to alleviate conflicts between the groups in performance; this approach, rooted in the multi-objective optimization theory, encourages to achieve the minimax Pareto solution. We also present a new benchmark with multiple biases, dubbed MultiCelebA, for evaluating debiased training methods under realistic and challenging scenarios. Our method achieved the best on three datasets with multiple biases, and also showed superior performance on conventional single-bias datasets.


Controllable Prompt Tuning For Balancing Group Distributional Robustness

arXiv.org Artificial Intelligence

Models trained on data composed of different groups or domains can suffer from severe performance degradation under distribution shifts. While recent methods have largely focused on optimizing the worst-group objective, this often comes at the expense of good performance on other groups. To address this problem, we introduce an optimization scheme to achieve good performance across groups and find a good solution for all without severely sacrificing performance on any of them. However, directly applying such optimization involves updating the parameters of the entire network, making it both computationally expensive and challenging. Thus, we introduce Controllable Prompt Tuning (CPT), which couples our approach with prompt-tuning techniques. On spurious correlation benchmarks, our procedures achieve state-of-the-art results across both transformer and non-transformer architectures, as well as unimodal and multimodal data, while requiring only 0.4% tunable parameters.