STREET: A Multi-Task Structured Reasoning and Explanation Benchmark

Ribeiro, Danilo, Wang, Shen, Ma, Xiaofei, Zhu, Henry, Dong, Rui, Kong, Deguang, Burger, Juliette, Ramos, Anjelica, Wang, William, Huang, Zhiheng, Karypis, George, Xiang, Bing, Roth, Dan

arXiv.org Artificial Intelligence 

Unlike most existing question-answering (QA) datasets, we expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer. We perform extensive evaluation with popular language models such as few-shot prompting GPT-3 and fine-tuned T5. We find that these models still lag behind human performance when producing such structured reasoning steps. We believe this work will provide a way for the community to better train and test systems on multi-step reasoning and explanations in natural language. A long-term pursuit in Artificial Intelligence is to endow machines with the ability to reason and manipulate premises to reach conclusions and perform tasks. Some recent works in the field of question-answering (QA) have demonstrated that language models can bypass some of these issues and learn to reason directly over natural language (Clark et al., 2020), allowing for more flexible and adaptable reasoning capabilities. Another advantage of performing multi-step reasoning over natural language is that it allows for more inspectable outputs, improving the explainability of models that are otherwise regarded as black box systems (Jain & Wallace, 2019; Rajani et al., 2019a; Danilevsky et al., 2020). Despite the recent progress, we notice that there is still a gap in resources for training and evaluating general reasoning capabilities over natural language. We build upon existing QA datasets by adding multi-premise, multi-step, structured explanations in the form of reasoning graphs, as depicted in Figure 1. When combined, all reasoning graphs contain a total of 151.1k reasoning steps (or textual entailments), of which 14.7k were created by our expert annotators.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found