Continuous Deep Q-Learning in Optimal Control Problems: Normalized Advantage Functions Analysis