AITopics | Elson, David

Collaborating Authors

Elson, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking

Farquhar, Sebastian, Varma, Vikrant, Lindner, David, Elson, David, Biddulph, Caleb, Goodfellow, Ian, Shah, Rohin

arXiv.org Artificial IntelligenceJan-22-2025

Future advanced AI systems may learn sophisticated strategies through reinforcement learning (RL) that humans cannot understand well enough to safely evaluate. We propose a training method which avoids agents learning undesired multi-step plans that receive high reward (multi-step "reward hacks") even if humans are not able to detect that the behaviour is undesired. The method, Myopic Optimization with Non-myopic Approval (MONA), works by combining short-sighted optimization with far-sighted reward. We demonstrate that MONA can prevent multi-step reward hacking that ordinary RL causes, even without being able to detect the reward hacking and without any extra information that ordinary RL does not get access to. We study MONA empirically in three settings which model different misalignment failure modes including 2-step environments with LLMs representing delegated oversight and encoded reasoning and longer-horizon gridworld environments representing sensor tampering.

large language model, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2501.13011

Genre:

Workflow (1.00)
Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.68)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Reports on the Fourth Artificial Intelligence for Interactive Digital Entertainment Conference Workshops

Elson, David (Google) | Rowe, Jonathan (North Carolina State University) | Smith, Adam M. (University of California, Santa Cruz) | Smith, Gillian (University of California, Santa Cruz) | Tomai, Emmett (University of Texas - Pan American)

AI MagazineMay-14-2012

The Seventh Artificial Intelligence for Interactive Digital Entertainment Conference (AIIDE-11) was held October 11–14, 2011 at Stanford University, Stanford, California. Two one-day workshops were held on October 11: Artificial Intelligence in the Game Design Process, and Intelligent Narrative Technologies. The highlights of each workshop are presented in this report.

artificial intelligence, interactive digital entertainment, management and information, (2 more...)

AI Magazine

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Reports on the Fourth Artificial Intelligence for Interactive Digital Entertainment Conference Workshops

AI MagazineMay-14-2012

artificial intelligence, computer game, workshop, (12 more...)

AI Magazine

Country: North America > United States > California > Santa Clara County > Stanford (0.25)

Genre: Instructional Material (0.35)

Industry: Leisure & Entertainment > Games > Computer Games (0.97)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback