Single-stream Policy Optimization