Learning Reward Machines through Preference Queries over Sequences