Defining and Characterizing Reward Hacking

Aug-14-2025, 08:04:57 GMT–Neural Information Processing Systems

This makes it crucial to align autonomous AI systems with their users' intentions. Precisely specifying which behaviours are or are not desirable is challenging, however. One approach to this specification problem is to learn an approximation of the true reward function (Ng et al., 2000;

proxy, reward function, simplification, (13 more...)

Neural Information Processing Systems

Aug-14-2025, 08:04:57 GMT

Conferences PDF

Add feedback

Country:
- Oceania > Australia (0.14)
- North America > Canada
  - Quebec > Montreal (0.04)
- Europe > United Kingdom
  - England
    - Cambridgeshire > Cambridge (0.04)
    - Oxfordshire > Oxford (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (0.69)
  - Machine Learning > Reinforcement Learning (0.48)

Duplicate Docs Excel Report

Title
3d719fee332caa23d5038b8a90e81796-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found