Learning Reward Machines from Partially Observed Optimal Policies