On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference