Vision and language pretraining in the absence of caption annotations

Oct-15-2020, 05:25:57 GMT–#artificialintelligence

Consider for a moment what it takes to visually identify and describe something to another person. Now imagine that the other person can't see the object or image, so every detail matters. How do you decide what information is important and what's not? You'll need to know exactly what everything is, where it is, what it's doing in relation to other objects, and note other attributes like color or position of objects in the foreground or background. This exercise shows there's no question that translating images into words is a complex task--one humans do so often and innately it seems automatic at times--requiring a wide range of knowledge about many unique things. In order to translate this skill into artificial intelligence (AI), we need to carefully consider and adapt models to the deep relationships between words and objects, the way they interrelate in expected and unexpected ways, and how contexts like environment and pose of an object affect the subtleties of associating and understanding new objects within categories.

caption annotation, machine learning, natural language, (12 more...)

#artificialintelligence

Oct-15-2020, 05:25:57 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning (0.99)
  - Vision (0.77)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found