AITopics | Yoran, Ori

Collaborating Authors

Yoran, Ori

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MultiModalQA: Complex Question Answering over Text, Tables and Images

Talmor, Alon, Yoran, Ori, Catav, Amnon, Lahav, Dan, Wang, Yizhong, Asai, Akari, Ilharco, Gabriel, Hajishirzi, Hannaneh, Berant, Jonathan

arXiv.org Artificial IntelligenceApr-13-2021

When answering complex questions, people can seamlessly combine information from visual, textual and tabular sources. While interest in models that reason over multiple pieces of evidence has surged in recent years, there has been relatively little work on question answering models that reason across multiple modalities. QA (MMQA): a challenging question answering dataset that requires joint reasoning over text, tables and images. We create MMQA using a new framework for generating complex multi-modal questions at scale, harvesting tables from Wikipedia, and attaching images and text paragraphs using entities that appear in each table. We then define a formal language that allows us to take questions that can be answered from a single modality, and combine them to generate cross-modal questions. Last, crowdsourcing workers take these automatically generated questions and rephrase them into more fluent language. When presented with complex questions, people often do not know in advance what source(s) of information are relevant for answering it. In general scenarios, these sources can encompass multiple modalities, be it paragraphs of text, structured tables, images or combinations of those. For instance, a user might ponder "When was the famous painting with two touching fingers completed?", Answering this question is made possible by integrating information across both the textual and visual modalities. Recently, there has been substantial interest in question answering (QA) models that reason over multiple pieces of evidence (multi-hop questions (Yang et al., 2018; Talmor & Berant, 2018; Welbl et al., 2017)). In most prior work, the question is phrased in natural language and the answer is in a context, which may be a paragraph (Rajpurkar, 2016), a table (Pasupat & Liang, 2015), or an image (Antol et al., 2015). However, there has been relatively little work on answering questions that require integrating information across modalities.

artificial intelligence, natural language, wikientity, (19 more...)

arXiv.org Artificial Intelligence

2104.06039

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry:

Media (0.68)
Leisure & Entertainment > Sports (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback