Reward learning from human preferences and demonstrations in Atari
Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei
–Neural Information Processing Systems
To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions.
Neural Information Processing Systems
Nov-20-2025, 18:11:59 GMT
- Country:
- North America > Canada > Quebec > Montreal (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Education (0.68)
- Leisure & Entertainment > Games (1.00)
- Technology: