Atomic Consistency Preference Optimization for Long-Form Question Answering
Chen, Jingfeng, Thirukovalluru, Raghuveer, Wang, Junlin, Luo, Kaiwei, Dhingra, Bhuwan
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) often produce factoid hallucinations - plausible yet incorrect answers. A common mitigation strategy is model alignment, which improves factual accuracy by training on curated (factual, non-factual) pairs. However, this approach often relies on a stronger model (e.g., GPT-4) or an external knowledge base to assess factual correctness that may not always be accessible. Addressing this, we propose Atomic Consistency Preference Optimization (ACPO), a self-supervised preference-tuning method that enhances factual accuracy without external supervision. ACPO leverages atomic consistency signals (i.e., the agreement of individual facts across multiple stochastic responses) to identify high- and low-quality data pairs for model alignment. Despite being fully self-supervised, ACPO outperforms the strong supervised alignment baseline by 1.95 points averaged across Phi-3 and Llama3 on the LongFact and BioGen datasets, demonstrating its effectiveness in improving factual reliability without relying on external models or knowledge bases.
arXiv.org Artificial Intelligence
Nov-11-2025
- Country:
- North America > United States (0.47)
- Asia > Middle East
- UAE (0.28)
- Genre:
- Research Report (0.64)
- Industry:
- Health & Medicine (0.35)
- Technology: