WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

Burns, Andrea, Srinivasan, Krishna, Ainslie, Joshua, Brown, Geoff, Plummer, Bryan A., Saenko, Kate, Ni, Jianmo, Guo, Mandy

arXiv.org Artificial Intelligence 

Webpages have been a rich resource for language and vision-language tasks. Yet only pieces of webpages are kept: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little attention and structured image-text data underused. To study multimodal webpage understanding, we introduce the Wikipedia Webpage 2M (WikiWeb2M) suite; the first to retain the full set of images, text, and structure data available in a page. WikiWeb2M can be used for tasks like page description generation, section summarization, and contextual image captioning.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found