COBE: Contextualized Object Embeddings from Narrated Instructional Video Facebook AI, 2
–Neural Information Processing Systems
Many objects in the real world undergo dramatic variations in visual appearance. For example, a tomato may be red or green, sliced or chopped, fresh or fried, liquid or solid. Training a single detector to accurately recognize tomatoes in all these different states is challenging. On the other hand, contextual cues (e.g., the presence of a knife, a cutting board, a strainer or a pan) are often strongly indicative of how the object appears in the scene. Recognizing such contextual cues is useful not only to improve the accuracy of object detection or to determine the state of the object, but also to understand its functional properties and to infer ongoing or upcoming human-object interactions.
Neural Information Processing Systems
Sep-29-2024, 17:47:12 GMT
- Country:
- North America > United States (0.68)
- Genre:
- Industry:
- Education > Educational Technology
- Audio & Video (0.42)
- Media (0.42)
- Education > Educational Technology
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Neural Networks (0.68)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Communications > Social Media (0.86)
- Artificial Intelligence
- Information Technology