AITopics | hadfield-menell

Collaborating Authors

hadfield-menell

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

b607ba543ad05417b8507ee86c54fcb7-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 22:58:33 GMT

Inhuman principal--agent problems, seemingly inconsequential changes toanagent'sincentives often lead to surprising, counter-intuitive, and counter-productive behavior (21). Consequently, we must ask when thismisalignment is costly: when is it counter-productive to optimize for an incompleteproxy?

artificial intelligence, machine learning, robot, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

3d719fee332caa23d5038b8a90e81796-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 11:45:30 GMT

proxy, reward function, simplification, (13 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Defining and Characterizing Reward Hacking

Neural Information Processing SystemsAug-14-2025, 08:04:57 GMT

This makes it crucial to align autonomous AI systems with their users' intentions. Precisely specifying which behaviours are or are not desirable is challenging, however. One approach to this specification problem is to learn an approximation of the true reward function (Ng et al., 2000;

proxy, reward function, simplification, (13 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Lindner, David, Matoba, Kyle, Meulemans, Alexander

arXiv.org Artificial IntelligenceJan-29-2021

Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended negative side effects, and overall unsafe behavior. To overcome this problem, recent work proposed to augment the specified reward function with an impact regularizer that discourages behavior that has a big impact on the environment. Although initial results with impact regularizers seem promising in mitigating some types of side effects, important challenges remain. In this paper, we examine the main current challenges of impact regularizers and relate them to fundamental design decisions. We discuss in detail which challenges recent approaches address and which remain unsolved. Finally, we explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.

agent, baseline, side effect, (15 more...)

arXiv.org Artificial Intelligence

2101.12509

Country:

North America > United States (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems

Saisubramanian, Sandhya, Zilberstein, Shlomo, Kamar, Ece

arXiv.org Artificial IntelligenceAug-28-2020

Autonomous agents acting in the real-world often operate based on models that ignore certain aspects of the environment. The incompleteness of any given model---handcrafted or machine acquired---is inevitable due to practical limitations of any modeling technique for complex real-world settings. Due to the limited fidelity of its model, an agent's actions may have unexpected, undesirable consequences during execution. Learning to recognize and avoid such negative side effects of the agent's actions is critical to improving the safety and reliability of autonomous systems. This emerging research topic is attracting increased attention due to the increased deployment of AI systems and their broad societal impacts. This article provides a comprehensive overview of different forms of negative side effects and the recent research efforts to address them. We identify key characteristics of negative side effects, highlight the challenges in avoiding negative side effects, and discuss recently developed approaches, contrasting their benefits and limitations. We conclude with a discussion of open questions and suggestions for future research directions.

artificial intelligence, machine learning, side effect, (15 more...)

arXiv.org Artificial Intelligence

2008.12146

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Asia > South Korea (0.14)
North America > United States > Washington > King County > Redmond (0.04)
(2 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Artificial Intelligence Will Do What We Ask. That's a Problem. Quanta Magazine

#artificialintelligenceJan-30-2020, 20:08:24 GMT

The danger of having artificially intelligent machines do our bidding is that we might not be careful enough about what we wish for. The lines of code that animate these machines will inevitably lack nuance, forget to spell out caveats, and end up giving AI systems goals and incentives that don't align with our true preferences. A now-classic thought experiment illustrating this problem was posed by the Oxford philosopher Nick Bostrom in 2003. Bostrom imagined a superintelligent robot, programmed with the seemingly innocuous goal of manufacturing paper clips. The robot eventually turns the whole world into a giant paper clip factory. Such a scenario can be dismissed as academic, a worry that might arise in some far-off future.

algorithm, quanta magazine, youtube, (9 more...)

#artificialintelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.06)
North America > United States > California > Alameda County > Berkeley (0.06)

Genre: Research Report (0.51)

Industry:

Government > Voting & Elections (0.51)
Transportation > Passenger (0.33)
Government > Regional Government > North America Government > United States Government (0.32)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.56)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.36)

Add feedback

Less self-assured AI are unlikely to override human orders

Daily Mail - Science & techJun-10-2017, 20:25:14 GMT

In the Terminator film franchise, hyper-intelligent robots learn to operate without their human masters, leading to a machine uprising that wipes out most of mankind. Researchers have now recommended that humans design intelligent robots of the future with less self-assurance to stop them breaking away from human control. The team suggest that over-confident artificial intelligence can cause an array of problems. Their research found that an AI that is too self-assured will override the wishes of its human supervisor. The team claim that over-confident artificial intelligence can cause an array of problems.

artificial intelligence, robot, self-confidence, (12 more...)

Daily Mail - Science & tech

Country: North America > United States > California (0.05)

Industry: Information Technology (0.32)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Robots will be more useful if they are made to lack confidence

New ScientistJun-5-2017, 18:10:20 GMT

Confidence in your abilities is usually a good thing – as long as you can recognise when it's time to ask for help. As we build ever smarter software, we may want to apply the same thinking to machines. An experiment that explores a robot's sense of its own usefulness could help guide how future artificial intelligences are built. Overconfident AI can cause all kinds of problems, says Dylan Hadfield-Menell at the University of California, Berkeley. Take Facebook's newsfeed algorithms, for example.

artificial intelligence, hadfield-menell, robot, (8 more...)

New Scientist

Country:

North America > United States > California > Alameda County > Berkeley (0.26)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.06)

Industry: Information Technology > Services (0.37)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

The Off-Switch Game

Hadfield-Menell, Dylan (University of California, Berkeley) | Dragan, Anca (University of California, Berkeley) | Abbeel, Pieter (University of California, Berkeley) | Russell, Stuart (University of California, Berkeley)

AAAI ConferencesFeb-4-2017

It is clear that one of the primary tools we can use to mitigate thepotential risk from a misbehaving AI system is the ability to turn thes ystem off. As the capabilities of AI systems improve, it is important to ensure that such systems do not adopt subgoals that prevent a human from switching them off. This is a challenge because many formulations of rational agents create strong incentives for self-preservation. This is not caused by a built-in instinct, but because a rational agent will maximize expected utility and cannot achieve whatever objective it has been given if it is dead. Our goal is to study the incentives an agent has to allow itself to be switched off. We analyze a simple game between a human H and a robot R, where H can press R's off switch but R can disable the off switch. A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H's actions as important observations about that utility. (R also has no incentive to switch itself off in this setting.) We conclude that giving machines an appropriate level of uncertainty about their objectives leads to safer designs, and we argue that this setting is a useful generalization of the classical AI paradigm of rational agents.

artificial intelligence, incentive, machine learning, (17 more...)

AAAI Conferences

Workshops at the Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Industry:

Leisure & Entertainment > Games (0.48)
Information Technology (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback