Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling

Open in new window