Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Kim, Taehyeong, Song, Hyeonseop, Zhang, Byoung-Tak

Jul-31-2022–arXiv.org Artificial Intelligence

Human infants learn the names of objects and develop their own conceptual systems without explicit supervision. In this study, we propose methods for learning aligned vision-language conceptual systems inspired by infants' word learning mechanisms. The proposed model learns the associations of visual objects and words online and gradually constructs cross-modal relational graph networks. Additionally, we also propose an aligned cross-modal representation learning method that learns semantic representations of visual objects and words in a self-supervised manner based on the cross-modal relational graph networks. It allows entities of different modalities with conceptually the same meaning to have similar semantic representation vectors. We quantitatively and qualitatively evaluate our method, including object-to-word mapping and zero-shot learning tasks, showing that the proposed model significantly outperforms the baselines and that each conceptual system is topologically aligned.

accuracy, graph network, representation, (13 more...)

arXiv.org Artificial Intelligence

Jul-31-2022

arXiv.org PDF

Add feedback

Country:
- Asia > South Korea > Seoul > Seoul (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Transportation > Air (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Neural Networks (1.00)
  - Natural Language > Text Processing (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found