FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning

Open in new window