Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Open in new window