Sampling Imbalanced Data with Multi-objective Bilevel Optimization
Medlin, Karen, Leyffer, Sven, Raghavan, Krishnan
–arXiv.org Artificial Intelligence
Two-class classification problems are often characterized by an imbalance between the number of majority and minority datapoints resulting in poor classification of the minority class in particular. Traditional approaches, such as reweighting the loss function or naïve resampling, risk overfitting and subsequently fail to improve classification because they do not consider the diversity between majority and minority datasets. Such consideration is infeasible because there is no metric that can measure the impact of imbalance on the model. To obviate these challenges, we make two key contributions. First, we introduce MOODS~(Multi-Objective Optimization for Data Sampling), a novel multi-objective bilevel optimization framework that guides both synthetic oversampling and majority undersampling. Second, we introduce a validation metric -- `$ε/ δ$ non-overlapping diversification metric' -- that quantifies the goodness of a sampling method towards model performance. With this metric we experimentally demonstrate state-of-the-art performance with improvement in diversity driving a $1-15 \%$ increase in $F1$ scores.
arXiv.org Artificial Intelligence
Jul-11-2025
- Country:
- Asia > South Korea
- Gyeongsangnam-do > Changwon (0.04)
- Europe
- Switzerland > Geneva
- Geneva (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Switzerland > Geneva
- North America > United States
- North Carolina (0.04)
- Asia > South Korea
- Genre:
- Research Report (1.00)
- Industry:
- Education (0.46)
- Law Enforcement & Public Safety > Fraud (0.46)
- Technology: