A Comprehensive Evaluation framework of Alignment Techniques for LLMs

Azmat, Muneeza, Abbas, Momin, de Macedo, Maysa Malfiza Garcia, Grave, Marcelo Carpinette, de Souza, Luan Soares, Machado, Tiago, de Paula, Rogerio A, Horesh, Raya, Chen, Yixin, Candello, Heloisa Caroline de Souza Pereira, Nordenlow, Rebecka, Adebiyi, Aminat

arXiv.org Artificial Intelligence 

As Large Language Models (LLMs) become increasingly integrated into real-world applications, ensuring their outputs align with human values and safety standards has become critical. The field has developed diverse alignment approaches including traditional fine-tuning methods (RLHF, instruction tuning), post-hoc correction systems, and inference-time interventions, each with distinct advantages and limitations. However, the lack of unified evaluation frameworks makes it difficult to systematically compare these paradigms and guide deployment decisions. This paper introduces a multi-dimensional evaluation of alignment techniques for LLMs, a comprehensive evaluation framework that provides a systematic comparison across all major alignment paradigms. Our framework assesses methods along four key dimensions: alignment detection, alignment quality, computational efficiency, and robustness. Through experiments across diverse base models and alignment strategies, we demonstrate the utility of our framework in identifying strengths and limitations of current state-of-the-art models, providing valuable insights for future research directions.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found