BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning

Open in new window