InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
–Neural Information Processing Systems
Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence. However, building general-purpose vision-language models is challenging due to the rich input distributions and task diversity resulting from the additional visual input.
Neural Information Processing Systems
Oct-9-2025, 02:28:18 GMT