Policy Optimization with Second-Order Advantage Information