The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

Kim, HyunJin, Yi, Xiaoyuan, Yao, Jing, Lian, Jianxun, Huang, Muhua, Duan, Shitong, Bak, JinYeong, Xie, Xing

Dec-25-2024–arXiv.org Artificial Intelligence

The emergence of large language models (LLMs) has sparkedthe discussion on Artificial Superintelligence (ASI), a hypothetical AI system surpassing human intelligence. Though ASI is still hypothetical and far from current AI capabilities, existing alignment methods struggle to guide such advanced AI ensure its safety in the future. It is essential to discuss the alignment of such AI now. Superalignment, the alignment of AI at superhuman levels of capability systems with human values and safety requirements, aims to address two primary goals: scalability in supervision to provide high-quality guidance signals and robust governance to ensure alignment with human values. In this survey, we review the original scalable oversight problem and corresponding methods and potential solutions for superalignment. Specifically, we introduce the Figure 1: Challenges from the perspectives of supervision challenges and limitations of current alignment and governance. While supervision perspective paradigms in addressing the superalignment focuses on providing high-quality guidance signals for problem. Then we review scalable oversight enhancing system competence, governance perspective methods for superalignment. Finally, we discuss emphasizes aligning the behavior of advanced aI with the key challenges and propose pathways human values to prevent harmful outcomes.

ai system, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Dec-25-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Thailand
  - Bangkok > Bangkok (0.04)
- North America > United States
  - Illinois > Cook County > Chicago (0.04)

Genre:
- Overview (1.00)
- Research Report > Promising Solution (0.66)

Industry:
- Health & Medicine (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found