Predicting and improving test-time scaling laws via reward tail-guided search