All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment