Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

Open in new window