Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning

Open in new window