Praxis-VLM: Vision-Grounded Decision Making via Text-Driven Reinforcement Learning

Neural Information Processing Systems 

Language is the dress of thought.