JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning