DesCo: Learning Object Recognition with Rich Language Descriptions
–Neural Information Processing Systems
Recent development in vision-language approaches has instigated a paradigm shift in learning visual recognition models from language supervision. These approaches align objects with language queries (e.g. "a photo of a cat") and thus improve the models' adaptability to novel objects and domains. Recent studies have attempted to query these models with complex language expressions that include specifications of fine-grained details, such as colors, shapes, and relations. However, simply incorporating language descriptions into queries does not guarantee accurate interpretation by the models.
Neural Information Processing Systems
Jan-19-2025, 08:09:19 GMT
- Genre:
- Research Report (0.41)
- Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)