Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL