Goto

Collaborating Authors

 Education





Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Neural Information Processing Systems

When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system. However, existing datasets fail to provide annotations for the answers, or are restricted to the textual-only modality, small scales, and limited domain diversity. To this end, we present Science Question Answering (SCIENCEQA), a new benchmark that consists of 21k multimodal multiple choice questions with diverse science topics and annotations of their answers with corresponding lectures and explanations. We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering SCIENCEQA questions. SCIENCEQA demonstrates the utility of CoT in language models, as CoT improves the question answering performance by 1.20% in fewshot GPT-3 and 3.99% in fine-tuned UnifiedQA. We also explore the upper bound for models to leverage explanations by feeding those in the input; we observe that it improves the few-shot performance of GPT-3 by 18.96%. Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.1


Play to Grade: Testing Coding Games as Classifying Markov Decision Process

Neural Information Processing Systems

Contemporary coding education often presents students with the task of developing programs that have user interaction and complex dynamic systems, such as mouse based games. While pedagogically compelling, there are no contemporary autonomous methods for providing feedback. Notably, interactive programs are impossible to grade by traditional unit tests.


Supplementary Material for Paper " Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs "

Neural Information Processing Systems

For example, the MatMul operation of TensorFlow has'MatMul' as As same as the call id stack, Terra manages the loop id stack for the entire program execution. Figure 2: The result of the case assignment algorithm for the given TraceGraph.2 4 In this section, we describe the case assignment algorithm that Terra uses to explicitly insert the Switch-Case operations in the symbolic graph. The algorithm takes a TraceGraph as an input and returns an ordered list of switch-cases. A switch-case 6is a set of (basic block, control edges) where thebasic block is a linear3 chain of nodes, and the5control edges are the edges that point to the basic block. Every non-overlapping linear chain of nodes in the TraceGraph is uniquely assigned to a basic block so that the ordered list of3switch-cases 5can cover every trace in the TraceGraph.