Defining and Characterizing Reward Hacking
–Neural Information Processing Systems
This makes it crucial to align autonomous AI systems with their users' intentions. Precisely specifying which behaviours are or are not desirable is challenging, however. One approach to this specification problem is to learn an approximation of the true reward function (Ng et al., 2000;
Neural Information Processing Systems
Aug-14-2025, 08:04:57 GMT
- Country:
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- England
- North America > Canada
- Oceania > Australia (0.14)
- Europe > United Kingdom
- Technology: