Parametrically Retargetable Decision-Makers Tend To Seek Power
–Neural Information Processing Systems
In fully observable environments, most reward functions have an optimal policy which seeks power by keeping options open and staying alive [ Turner et al., 2021 ]. However, the real world is neither fully observable, nor must trained agents be even approximately reward-optimal.
Neural Information Processing Systems
Nov-16-2025, 03:18:01 GMT
- Country:
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- Oregon (0.04)
- Europe > United Kingdom
- Genre:
- Research Report (0.47)
- Industry:
- Leisure & Entertainment > Games (0.48)