Exploring Test-time Scaling via Prediction Merging on Large-Scale Recommendation