TD-M(PC)$^2$: Improving Temporal Difference MPC Through Policy Constraint