Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention

Nov-23-2020–arXiv.org Artificial Intelligence

A drawback of these methods is that they consider only global image context, which may contain information Accurate and e cient product classi cation is signi cant for E-irrelevant to the question. To overcome this, some methods commerce applications, as it enables various downstream tasks have proposed visual attention models that attend to local spatial such as recommendation, retrieval, and pricing. Items often contain regions pertaining to a given question, and then perform multimodal textual and visual information, and utilizing both modalities usually fusion to classify answers accurately [4, 19, 21, 22]. More outperforms classi cation utilizing either mode alone. In this recently, dual attention models have been proposed.

cation, classi cation, representation, (13 more...)

arXiv.org Artificial Intelligence

Nov-23-2020

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.04)
- North America > United States
  - California > Santa Clara County > Palo Alto (0.05)

Genre:
- Research Report (0.85)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found