AITopics | scalable alignment

Collaborating Authors

scalable alignment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Neural Information Processing SystemsMay-27-2025, 02:26:40 GMT

Current AI alignment methodologies rely on human-provided demonstrations or judgments, and the learned capabilities of AI systems would be upper-bounded by human capabilities as a result. This raises a challenging research question: How can we keep improving the systems when their capabilities have surpassed the levels of humans? Our key insight is that an evaluator (reward model) trained on supervisions for easier tasks can be effectively used for scoring candidate solutions of harder tasks and hence facilitating easy-to-hard generalization over different levels of tasks. Based on this insight, we propose a novel approach to scalable alignment, which firstly trains the (process-supervised) reward models on easy problems (e.g., level 1-3), and then uses them to evaluate the performance of policy models on hard problems. We show that such easy-to-hard generalization from evaluators can enable easy-to-hard generalizations in generators either through re-ranking or reinforcement learning (RL).

artificial intelligence, easy-to-hard generalization, machine learning, (8 more...)

Neural Information Processing Systems

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Evolving Alignment via Asymmetric Self-Play

Ye, Ziyu, Agarwal, Rishabh, Liu, Tianqi, Joshi, Rishabh, Velury, Sarmishta, Le, Quoc V., Tan, Qijun, Liu, Yuan

arXiv.org Machine LearningDec-12-2024

Current RLHF frameworks for aligning large language models (LLMs) typically assume a fixed prompt distribution, which is sub-optimal and limits the scalability of alignment and generalizability of models. To address this, we introduce a general open-ended RLHF framework that casts alignment as an asymmetric game between two players: (i) a creator that generates increasingly informative prompt distributions using reward signals, and (ii) a solver that learns to produce more preferred responses on prompts produced by the creator. This framework of Evolving Alignment via Asymmetric Self-Play (eva), results in a simple and efficient approach that can utilize any existing RLHF algorithm for scalable alignment. eva outperforms state-of-the-art methods on widely-used benchmarks, without the need of any additional human crafted prompts. Specifically, eva improves the win rate of Gemma-2-9B-it on Arena-Hard from 51.6% to 60.1% with DPO, from 55.7% to 58.9% with SPPO, from 52.3% to 60.7% with SimPO, and from 54.8% to 60.3% with ORPO, surpassing its 27B version and matching claude-3-opus. This improvement is persistent even when new human crafted prompts are introduced. Finally, we show eva is effective and robust under various ablation settings.

arxiv preprint arxiv, complexity, scalable alignment, (13 more...)

arXiv.org Machine Learning

2411.00062

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Instructional Material > Course Syllabus & Notes (0.45)
Research Report > New Finding (0.45)

Industry:

Leisure & Entertainment > Games (0.67)
Education > Curriculum (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Research Engineer - Scalable Alignment

#artificialintelligenceMay-12-2022, 00:23:57 GMT

At DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives, and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, maternity or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know. At DeepMind, we've built a unique culture and work environment where long-term ambitious research can flourish. Our special interdisciplinary team combines the best techniques from deep learning, reinforcement learning and systems neuroscience to build general-purpose learning algorithms.

deepmind, research engineer, scalable alignment, (5 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Neurology (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.80)

Add feedback