CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision