A Comprehensive Evaluation framework of Alignment Techniques for LLMs

Azmat, Muneeza, Abbas, Momin, de Macedo, Maysa Malfiza Garcia, Grave, Marcelo Carpinette, de Souza, Luan Soares, Machado, Tiago, de Paula, Rogerio A, Horesh, Raya, Chen, Yixin, Candello, Heloisa Caroline de Souza Pereira, Nordenlow, Rebecka, Adebiyi, Aminat

Aug-15-2025–arXiv.org Artificial Intelligence

As Large Language Models (LLMs) become increasingly integrated into real-world applications, ensuring their outputs align with human values and safety standards has become critical. The field has developed diverse alignment approaches including traditional fine-tuning methods (RLHF, instruction tuning), post-hoc correction systems, and inference-time interventions, each with distinct advantages and limitations. However, the lack of unified evaluation frameworks makes it difficult to systematically compare these paradigms and guide deployment decisions. This paper introduces a multi-dimensional evaluation of alignment techniques for LLMs, a comprehensive evaluation framework that provides a systematic comparison across all major alignment paradigms. Our framework assesses methods along four key dimensions: alignment detection, alignment quality, computational efficiency, and robustness. Through experiments across diverse base models and alignment strategies, we demonstrate the utility of our framework in identifying strengths and limitations of current state-of-the-art models, providing valuable insights for future research directions.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Aug-15-2025

arXiv.org PDF

Add feedback

Country:
- North America > Mexico (0.28)
- Asia > Middle East
  - UAE (0.28)

Genre:
- Research Report (1.00)

Industry:
- Law (1.00)
- Information Technology (0.68)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
- Health & Medicine > Therapeutic Area
  - Psychiatry/Psychology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found