AI Alignment: A Comprehensive Survey

Ji, Jiaming, Qiu, Tianyi, Chen, Boyuan, Zhang, Borong, Lou, Hantao, Wang, Kaile, Duan, Yawen, He, Zhonghao, Zhou, Jiayi, Zhang, Zhaowei, Zeng, Fanzhi, Ng, Kwan Yee, Dai, Juntao, Pan, Xuehai, O'Gara, Aidan, Lei, Yingshan, Xu, Hua, Tse, Brian, Fu, Jie, McAleer, Stephen, Yang, Yaodong, Wang, Yizhou, Zhu, Song-Chun, Guo, Yike, Gao, Wen

Jan-2-2024–arXiv.org Artificial Intelligence

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality (RICE). Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment. The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss techniques for learning from feedback and learning under distribution shift. On backward alignment, we discuss assurance techniques and governance practices. We also release and continually update the website (www.alignmentsurvey.com) which features tutorials, collections of papers, blog posts, and other resources.

reward model overoptimization, unrestricted adversarial attack, virtual event punta cana, (17 more...)

arXiv.org Artificial Intelligence

Jan-2-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Oceania
  - New Zealand > North Island
    - Auckland Region > Auckland (0.04)
  - Australia > Victoria
    - Melbourne (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Virginia (0.04)
    - Maryland > Baltimore (0.04)
    - Oregon (0.04)
    - Minnesota
      - Hennepin County > Minneapolis (0.14)
      - Ramsey County > Saint Paul (0.04)
    - Arizona > Maricopa County
      - Phoenix (0.04)
    - Massachusetts > Suffolk County
      - Boston (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Utah > Salt Lake County
      - Salt Lake City (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
    - California
      - San Francisco County > San Francisco (0.13)
      - Los Angeles County > Long Beach (0.13)
      - San Mateo County > Menlo Park (0.04)
      - Santa Clara County > Mountain View (0.04)
    - New York > New York County
      - New York City (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - France (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Spain
    - Galicia > A Coruña Province
      - Santiago de Compostela (0.04)
    - Catalonia > Barcelona Province
      - Barcelona (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - United Kingdom > England
    - Greater London > London (0.27)
    - Oxfordshire > Oxford (0.14)
    - Cambridgeshire > Cambridge (0.13)
    - Bristol (0.04)
  - Greece > Attica
    - Athens (0.04)
  - Latvia > Lubāna Municipality
    - Lubāna (0.04)
  - Italy
    - Tuscany > Florence (0.04)
    - Marche > Ancona Province
      - Ancona (0.04)
  - Russia > Northwestern Federal District
    - Leningrad Oblast > Saint Petersburg (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
  - Portugal
    - Vila Real > Vila Real (0.04)
    - Porto > Porto (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Russia (0.04)
  - Macao (0.04)
  - Indonesia > Bali (0.04)
  - Middle East
    - Jordan (0.04)
    - Israel (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.14)
- Africa
  - Rwanda > Kigali
    - Kigali (0.04)
  - Ethiopia > Addis Ababa
    - Addis Ababa (0.04)

Genre:
- Research Report > New Finding (1.00)
- Overview (1.00)

Industry:
- Social Sector (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)
- Transportation (1.00)
- Law > Statutes (0.92)
- Law Enforcement & Public Safety (0.92)
- Media (0.67)
- Health & Medicine > Therapeutic Area (0.67)
- Education > Educational Setting (0.67)
- Energy (0.67)
- Leisure & Entertainment > Games
  - Computer Games (0.67)
- Government
  - Military (1.00)
  - Regional Government (0.67)

Technology:
- Information Technology
  - Human Computer Interaction > Interfaces (1.00)
  - Data Science > Data Mining (1.00)
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Robots (1.00)
    - Issues > Social & Ethical Issues (1.00)
    - Applied AI (1.00)
    - Representation & Reasoning
      - Optimization (1.00)
      - Expert Systems (1.00)
      - Personal Assistant Systems (0.67)
      - Agents > Agent Societies (0.67)
      - Uncertainty > Bayesian Inference (0.67)
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (1.00)
      - Generation (0.67)
    - Machine Learning
      - Statistical Learning (1.00)
      - Reinforcement Learning (1.00)
      - Neural Networks > Deep Learning (1.00)
      - Learning Graphical Models > Directed Networks
        Bayesian Learning (0.45)
    - Cognitive Science
      - Problem Solving (1.00)
      - Simulation of Human Behavior (0.67)