WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

Burns, Andrea, Srinivasan, Krishna, Ainslie, Joshua, Brown, Geoff, Plummer, Bryan A., Saenko, Kate, Ni, Jianmo, Guo, Mandy

May-9-2023–arXiv.org Artificial Intelligence

Webpages have been a rich resource for language and vision-language tasks. Yet only pieces of webpages are kept: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little attention and structured image-text data underused. To study multimodal webpage understanding, we introduce the Wikipedia Webpage 2M (WikiWeb2M) suite; the first to retain the full set of images, text, and structure data available in a page. WikiWeb2M can be used for tasks like page description generation, section summarization, and contextual image captioning.

artificial intelligence, natural language, wikiweb2m, (17 more...)

arXiv.org Artificial Intelligence

May-9-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.05)

Genre:
- Research Report (0.40)

Technology:
- Information Technology
  - Communications > Social Media (0.76)
  - Artificial Intelligence > Natural Language (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found