Scalable Bilevel Loss Balancing for Multi-Task Learning

Xiao, Peiyao, Dong, Chaosheng, Zou, Shaofeng, Ji, Kaiyi

Feb-12-2025–arXiv.org Artificial Intelligence

In recent years, Multi-Task Learning (MTL) has received increasing attention for its ability to predict multiple tasks simultaneously using a single model, thereby reducing computational overhead. This versatility has enabled a wide range of applications, including autonomous driving (Chen et al., 2018), recommendation systems (Wang et al., 2020), and natural language processing (Zhang et al., 2022). Typically, research in MTL follows two main schemes. Scalarization-based methods, such as linear scalarization, reduce MTL to a scalar optimization problem by using an averaged or weighted sum of loss functions as the objective. Due to its simplicity and scalability, it became the prominent approach in the early studies (Caruana, 1997). However, it often causes performance degradation compared with single-task learning due to the gradient conflict (Yu et al., 2020; Liu et al., 2021a). Gradient conflict arises from two main reasons: 1) gradients point in different directions and 2) gradient magnitudes vary significantly. As a result, the final update gradient may either be offset or dominated by the largest gradient (Liu et al., 2021b). To mitigate this issue, various gradient manipulation methods have been developed to find balanced and fair solutions via seeking a better conflict-aware update direction (Désidéri, 2012; Liu et al., 2021a; Ban & Ji,

artificial intelligence, machine learning, stationary point, (13 more...)

arXiv.org Artificial Intelligence

Feb-12-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Arizona (0.14)

Genre:
- Research Report (0.64)

Industry:
- Information Technology (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning > Optimization (1.00)