AI Alignment: A Comprehensive Survey
Ji, Jiaming, Qiu, Tianyi, Chen, Boyuan, Zhang, Borong, Lou, Hantao, Wang, Kaile, Duan, Yawen, He, Zhonghao, Zhou, Jiayi, Zhang, Zhaowei, Zeng, Fanzhi, Ng, Kwan Yee, Dai, Juntao, Pan, Xuehai, O'Gara, Aidan, Lei, Yingshan, Xu, Hua, Tse, Brian, Fu, Jie, McAleer, Stephen, Yang, Yaodong, Wang, Yizhou, Zhu, Song-Chun, Guo, Yike, Gao, Wen
–arXiv.org Artificial Intelligence
AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality (RICE). Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment. The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss techniques for learning from feedback and learning under distribution shift. On backward alignment, we discuss assurance techniques and governance practices. We also release and continually update the website (www.alignmentsurvey.com) which features tutorials, collections of papers, blog posts, and other resources.
arXiv.org Artificial Intelligence
Jan-2-2024
- Country:
- Asia > Middle East (0.67)
- Europe > United Kingdom
- England > Greater London > London (0.27)
- North America > United States
- California (1.00)
- Genre:
- Overview (1.00)
- Research Report > New Finding (1.00)
- Industry:
- Media (0.67)
- Transportation (1.00)
- Banking & Finance (1.00)
- Health & Medicine > Therapeutic Area (0.67)
- Government
- Military (1.00)
- Regional Government (0.92)
- Law (1.00)
- Energy (0.67)
- Education > Educational Setting (0.67)
- Leisure & Entertainment > Games
- Computer Games (0.67)
- Information Technology > Security & Privacy (1.00)
- Social Sector (1.00)
- Law Enforcement & Public Safety (0.92)
- Technology:
- Information Technology
- Artificial Intelligence
- Applied AI (1.00)
- Cognitive Science
- Problem Solving (1.00)
- Simulation of Human Behavior (0.67)
- Issues > Social & Ethical Issues (1.00)
- Machine Learning
- Learning Graphical Models > Directed Networks
- Bayesian Learning (0.45)
- Neural Networks > Deep Learning (1.00)
- Reinforcement Learning (1.00)
- Statistical Learning (1.00)
- Learning Graphical Models > Directed Networks
- Natural Language
- Chatbot (1.00)
- Generation (0.67)
- Large Language Model (1.00)
- Representation & Reasoning
- Agents > Agent Societies (0.67)
- Expert Systems (1.00)
- Optimization (1.00)
- Personal Assistant Systems (0.67)
- Uncertainty > Bayesian Inference (0.67)
- Robots (1.00)
- Vision (1.00)
- Communications > Social Media (1.00)
- Data Science > Data Mining (1.00)
- Human Computer Interaction > Interfaces (1.00)
- Artificial Intelligence
- Information Technology