TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning