Yo'LLaVA: Your Personalized Language and Vision Assistant
–Neural Information Processing Systems
Large Multimodal Models (LMMs) have shown remarkable capabilities across a variety of tasks (e.g., image captioning, visual question answering). While broad, their knowledge remains generic (e.g., recognizing a dog), and they are unable to handle personalized subjects (e.g., recognizing a user's pet dog). Human reasoning, in contrast, typically operates within the context of specific subjects in our surroundings. For example, one might ask, "What should I buy for my dog's birthday?";
Neural Information Processing Systems
May-29-2025, 09:48:31 GMT
- Country:
- North America > United States > Wisconsin (0.14)
- Genre:
- Personal (0.67)
- Research Report > Experimental Study (0.93)
- Technology: