Zeroth-Order Optimization is Secretly Single-Step Policy Optimization

Open in new window