Metric-DST: Mitigating Selection Bias Through Diversity-Guided Semi-Supervised Metric Learning

Tepeli, Yasin I., de Wolf, Mathijs, Gonçalves, Joana P.

Nov-28-2024–arXiv.org Artificial Intelligence

Selection bias poses a critical challenge for fairness in machine learning, as models trained on data that is less representative of the population might exhibit undesirable behavior for underrepresented profiles. Semi-supervised learning strategies like self-training can mitigate selection bias by incorporating unlabeled data into model training to gain further insight into the distribution of the population. However, conventional self-training seeks to include high-confidence data samples, which may reinforce existing model bias and compromise effectiveness. We propose Metric-DST, a diversity-guided self-training strategy that leverages metric learning and its implicit embedding space to counter confidence-based bias through the inclusion of more diverse samples. Metric-DST learned more robust models in the presence of selection bias for generated and real-world datasets with induced bias, as well as a molecular biology prediction task with intrinsic bias. The Metric-DST learning strategy offers a flexible and widely applicable solution to mitigate selection bias and enhance fairness of machine learning models.

artificial intelligence, machine learning, metric-dst, (14 more...)

arXiv.org Artificial Intelligence

Nov-28-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Queensland (0.04)
  - New South Wales > Sydney (0.04)
- North America
  - United States (0.14)
  - Canada > Alberta
    - Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe
  - Netherlands > South Holland
    - Delft (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report > Experimental Study (0.96)

Industry:
- Health & Medicine
  - Therapeutic Area > Oncology (1.00)
  - Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks (1.00)