Large Reasoning Models Learn Better Alignment from Flawed Thinking