Visually-Augmented Language Modeling

Wang, Weizhi, Dong, Li, Cheng, Hao, Song, Haoyu, Liu, Xiaodong, Yan, Xifeng, Gao, Jianfeng, Wei, Furu

Feb-25-2023–arXiv.org Artificial Intelligence

Human language is grounded on multimodal knowledge including visual knowledge like colors, sizes, and shapes. However, current large-scale pre-trained language models rely on text-only self-supervised training with massive text data, which precludes them from utilizing relevant visual information when necessary. LM, to Visually-augment text tokens with retrieved relevant images for Language Modeling. LM builds on a novel latent text-image alignment method via an image retrieval module to fetch corresponding images given a textual context. LM uses a visual knowledge fusion layer to enable multimodal grounded language modeling by attending to both text context and visual knowledge in images. LM on various visual knowledge-intensive commonsense reasoning tasks, which require visual information to excel. LM outperforms all strong language-only and vision-language baselines with substantial gains in reasoning object commonsense including color, size, and shape. Our code is available at https://github.com/Victorwz/VaLM. Large-scale pre-trained language models (PLMs) have achieved great success in promoting state of the art on various natural language understanding and generation tasks (Devlin et al., 2019; Radford et al., 2019; Liu et al., 2019; Yang et al., 2019; Brown et al., 2020; Wang et al., 2022). PLM self-supervision training largely benefits from harvesting local context information in the pre-training corpus.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Feb-25-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Italy (0.28)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (0.96)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found