What should post-training optimize? A test-time scaling law perspective

Open in new window