Atomic Consistency Preference Optimization for Long-Form Question Answering
Chen, Jingfeng, Thirukovalluru, Raghuveer, Wang, Junlin, Luo, Kaiwei, Dhingra, Bhuwan
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) often produce factoid hallucinations - plausible yet incorrect answers. A common mitigation strategy is model alignment, which improves factual accuracy by training on curated (factual, non-factual) pairs. However, this approach often relies on a stronger model (e.g., GPT-4) or an external knowledge base to assess factual correctness that may not always be accessible. Addressing this, we propose Atomic Consistency Preference Optimization (ACPO), a self-supervised preference-tuning method that enhances factual accuracy without external supervision. ACPO leverages atomic consistency signals (i.e., the agreement of individual facts across multiple stochastic responses) to identify high- and low-quality data pairs for model alignment. Despite being fully self-supervised, ACPO outperforms the strong supervised alignment baseline by 1.95 points averaged across Phi-3 and Llama3 on the LongFact and BioGen datasets, demonstrating its effectiveness in improving factual reliability without relying on external models or knowledge bases.
arXiv.org Artificial Intelligence
Nov-11-2025
- Country:
- Africa > South Africa (0.04)
- Asia
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.14)
- Singapore (0.05)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East > UAE
- Atlantic Ocean > Mediterranean Sea
- Strait of Gibraltar (0.04)
- Europe
- Gibraltar (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.05)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Florida > Miami-Dade County
- Genre:
- Research Report (0.64)
- Industry:
- Health & Medicine (0.35)
- Technology: