POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes