Large Reasoning Models Learn Better Alignment from Flawed Thinking

Open in new window