An Advantage-based Optimization Method for Reinforcement Learning in Large Action Space