Achieving balanced alignment of large language models (LLMs) in terms of Help-Harmless O fulness,ptimHonestyizat,iandon Harmlessness H(3Heoptimization)lpful Opconstitutestimizaatcornerstoneion
–Neural Information Processing Systems
Existing methods like data mixture strategies face limitations, including heavy reliance on expert knowledge and conflicting optimization signals. While model merging offers parameter-level conflict-resolution strategies through integrating specialized models' parameters, its potential for 3H optimization remains underexplored. This paper systematically compares the effectiveness of model merging and data mixture methods in constructing 3H-aligned LLMs for the first time, revealing previously overlooked collaborative and conflict relationships among the 3H dimensions and discussing the advantages and drawbacks of Mdata mixture (data-level) and model merging (parameter-level) methods in mitiodgating the conflict for balanced 3H optimization.
Neural Information Processing Systems
Jun-23-2026, 10:53:44 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology > Security & Privacy (0.46)
- Technology: