The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation