DesCo: Learning Object Recognition with Rich Language Descriptions

Jan-19-2025, 08:09:19 GMT–Neural Information Processing Systems

Recent development in vision-language approaches has instigated a paradigm shift in learning visual recognition models from language supervision. These approaches align objects with language queries (e.g. "a photo of a cat") and thus improve the models' adaptability to novel objects and domains. Recent studies have attempted to query these models with complex language expressions that include specifications of fine-grained details, such as colors, shapes, and relations. However, simply incorporating language descriptions into queries does not guarantee accurate interpretation by the models.

language description, learning object recognition, rich language description, (3 more...)

Neural Information Processing Systems

Jan-19-2025, 08:09:19 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.41)

Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)