Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Neural Information Processing Systems 

Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found