Microsoft's AI learns to answer questions about scenes from image-text pairs

Oct-10-2019, 16:01:29 GMT–#artificialintelligence

Machines struggle to make sense of scenes and language without detailed accompanying annotations. Unfortunately, labeling is generally time-consuming and expensive, and even the best labels convey an understanding only of scenes and not of language. In an attempt to remedy the problem, Microsoft researchers conceived of an AI system that trains on image-text pairs in a fashion mimicking the way humans improve their understanding of the world. They say that their single-model encoder-decoder Vision-Language Pre-training (VLP) model, which can both generate image descriptions and answer natural language questions about scenes, lays the groundwork for future frameworks that could reach human parity. A model pretrained using three million image-text pairs is available on GitHub in open source.

answer question, image-text pair, representation, (7 more...)

#artificialintelligence

Oct-10-2019, 16:01:29 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.89)
  - Natural Language (0.76)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found