Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Open in new window