AI Alignment: A Comprehensive Survey
Ji, Jiaming, Qiu, Tianyi, Chen, Boyuan, Zhang, Borong, Lou, Hantao, Wang, Kaile, Duan, Yawen, He, Zhonghao, Zhou, Jiayi, Zhang, Zhaowei, Zeng, Fanzhi, Ng, Kwan Yee, Dai, Juntao, Pan, Xuehai, O'Gara, Aidan, Lei, Yingshan, Xu, Hua, Tse, Brian, Fu, Jie, McAleer, Stephen, Yang, Yaodong, Wang, Yizhou, Zhu, Song-Chun, Guo, Yike, Gao, Wen
–arXiv.org Artificial Intelligence
AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality (RICE). Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment. The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss techniques for learning from feedback and learning under distribution shift. On backward alignment, we discuss assurance techniques and governance practices. We also release and continually update the website (www.alignmentsurvey.com) which features tutorials, collections of papers, blog posts, and other resources.
arXiv.org Artificial Intelligence
Jan-2-2024
- Country:
- South America > Chile
- Oceania
- New Zealand > North Island
- Auckland Region > Auckland (0.04)
- Australia > Victoria
- Melbourne (0.04)
- New Zealand > North Island
- North America
- Dominican Republic (0.04)
- United States
- Virginia (0.04)
- Maryland > Baltimore (0.04)
- Oregon (0.04)
- Minnesota
- Hennepin County > Minneapolis (0.14)
- Ramsey County > Saint Paul (0.04)
- Arizona > Maricopa County
- Phoenix (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Illinois > Cook County
- Chicago (0.04)
- California
- San Francisco County > San Francisco (0.13)
- Los Angeles County > Long Beach (0.13)
- San Mateo County > Menlo Park (0.04)
- Santa Clara County > Mountain View (0.04)
- New York > New York County
- New York City (0.04)
- Canada
- Quebec > Montreal (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- France (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Spain
- Denmark > Capital Region
- Copenhagen (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- United Kingdom > England
- Greater London > London (0.27)
- Oxfordshire > Oxford (0.14)
- Cambridgeshire > Cambridge (0.13)
- Bristol (0.04)
- Greece > Attica
- Athens (0.04)
- Latvia > Lubāna Municipality
- Lubāna (0.04)
- Italy
- Tuscany > Florence (0.04)
- Marche > Ancona Province
- Ancona (0.04)
- Russia > Northwestern Federal District
- Leningrad Oblast > Saint Petersburg (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Portugal
- Asia
- Africa
- Rwanda > Kigali
- Kigali (0.04)
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Rwanda > Kigali
- Genre:
- Research Report > New Finding (1.00)
- Overview (1.00)
- Industry:
- Social Sector (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)
- Transportation (1.00)
- Law > Statutes (0.92)
- Law Enforcement & Public Safety (0.92)
- Media (0.67)
- Health & Medicine > Therapeutic Area (0.67)
- Education > Educational Setting (0.67)
- Energy (0.67)
- Leisure & Entertainment > Games
- Computer Games (0.67)
- Government
- Military (1.00)
- Regional Government (0.67)
- Technology:
- Information Technology
- Human Computer Interaction > Interfaces (1.00)
- Data Science > Data Mining (1.00)
- Communications > Social Media (1.00)
- Artificial Intelligence
- Vision (1.00)
- Robots (1.00)
- Issues > Social & Ethical Issues (1.00)
- Applied AI (1.00)
- Representation & Reasoning
- Optimization (1.00)
- Expert Systems (1.00)
- Personal Assistant Systems (0.67)
- Agents > Agent Societies (0.67)
- Uncertainty > Bayesian Inference (0.67)
- Natural Language
- Large Language Model (1.00)
- Chatbot (1.00)
- Generation (0.67)
- Machine Learning
- Statistical Learning (1.00)
- Reinforcement Learning (1.00)
- Neural Networks > Deep Learning (1.00)
- Learning Graphical Models > Directed Networks
- Bayesian Learning (0.45)
- Cognitive Science
- Problem Solving (1.00)
- Simulation of Human Behavior (0.67)
- Information Technology