Occam's razor is insufficient to infer the preferences of irrational agents
Stuart Armstrong, Sören Mindermann
–Neural Information Processing Systems
In today's reinforcement learning systems, a simple reward function is often hand-crafted, and still
Neural Information Processing Systems
Nov-18-2025, 20:27:10 GMT
- Country:
- Europe
- Netherlands > South Holland
- Dordrecht (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.14)
- Netherlands > South Holland
- North America
- Canada > Ontario
- Toronto (0.14)
- United States > Massachusetts
- Middlesex County > Cambridge (0.14)
- Canada > Ontario
- Europe
- Technology: