Learning Reward Machines through Preference Queries over Sequences

Open in new window