Predicting and improving test-time scaling laws via reward tail-guided search

Open in new window