Deceptive Sequential Decision-Making via Regularized Policy Optimization