Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning

Open in new window