Parametrically Retargetable Decision-Makers Tend To Seek Power

Neural Information Processing Systems 

In fully observable environments, most reward functions have an optimal policy which seeks power by keeping options open and staying alive [ Turner et al., 2021 ]. However, the real world is neither fully observable, nor must trained agents be even approximately reward-optimal.