AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning

Open in new window