Gradient Regularized V-Learning for Dynamic Treatment Regimes