The education community is moving towards constructed or free-text responses and computer-based assessment. At the same time, progress in natural language processing and knowledge representation has made it possible to consider free-text or constructed responses without having to fully understand the text.
Current end-to-end machine reading and question answering (Q\&A) models are primarily based on recurrent neural networks (RNNs) with attention. Despite their success, these models are often slow for both training and inference due to the sequential nature of RNNs. We propose a new Q\&A architecture called QANet, which does not require recurrent networks: Its encoder consists exclusively of convolution and self-attention, where convolution models local interactions and self-attention models global interactions. On the SQuAD dataset, our model is 3x to 13x faster in training and 4x to 9x faster in inference, while achieving equivalent accuracy to recurrent models. The speed-up gain allows us to train the model with much more data. We hence combine our model with data generated by backtranslation from a neural machine translation model. On the SQuAD dataset, our single model, trained with augmented data, achieves 84.6 F1 score on the test set, which is significantly better than the best published F1 score of 81.8.
We demonstrate that current state-of-the-art approaches to Automated Essay Scoring (AES) are not well-suited to capturing adversarially crafted input of grammatical but incoherent sequences of sentences. We develop a neural model of local coherence that can effectively learn connectedness features between sentences, and propose a framework for integrating and jointly training the local coherence model with a state-of-the-art AES model. We evaluate our approach against a number of baselines and experimentally demonstrate its effectiveness on both the AES task and the task of flagging adversarial input, further contributing to the development of an approach that strengthens the validity of neural essay scoring models.
Automated essay scoring (AES) is a broadly used application of machine learning, with a long history of real-world use that impacts high-stakes decision-making for students. However, defensibility arguments in this space have typically been rooted in hand-crafted features and psychometrics research, which are a poor fit for recent advances in AI research and more formative classroom use of the technology. This paper proposes a framework for evaluating automated essay scoring models trained with more modern algorithms, used in a classroom setting; that framework is then applied to evaluate an existing product, Turnitin Revision Assistant.
If you asked most people, they'd probably say that computers and other gadgets are pretty good at communicating information to us, whether it's by providing directions to an important business meeting or finding the best recipe for gluten-free apple pie. And yet, computers still don't communicate with us nearly as intuitively as we communicate with each other. If you type a query into a search engine, for example, chances are you'll get a list of websites to click on. But if you ask a person a question, she'll respond with an answer, or perhaps ask another question to get more information before answering. Microsoft is hoping to improve how well computers can communicate information to us.
Machine comprehension of text is the problem to answer a query based on a given context. Many existing systems use RNN-based units for contextual modeling linked with some attention mechanisms. In this paper, however, we propose StackReader, an end-to-end neural network model, to solve this problem, without recurrent neural network (RNN) units and its variants. This simple model is based solely on attention mechanism and gated convolutional neural network. Experiments on SQuAD have shown to have relatively high accuracy with a significant decrease in training time.
Tan, Chuanqi (Beihang University) | Wei, Furu (Microsoft Research) | Yang, Nan (Microsoft Research) | Du, Bowen (Beihang University) | Lv, Weifeng (Beihang University) | Zhou, Ming (Microsoft Research)
In this paper, we present a novel approach to machine reading comprehension for the MS-MARCO dataset. Unlike the SQuAD dataset that aims to answer a question with exact text spans in a passage, the MS-MARCO dataset defines the task as answering a question from multiple passages and the words in the answer are not necessary in the passages. We therefore develop an extraction-then-synthesis framework to synthesize answers from extraction results. Specifically, the answer extraction model is first employed to predict the most important sub-spans from the passage as evidence, and the answer synthesis model takes the evidence as additional features along with the question and passage to further elaborate the final answers. We build the answer extraction model with state-of-the-art neural networks for single passage reading comprehension, and propose an additional task of passage ranking to help answer extraction in multiple passages. The answer synthesis model is based on the sequence-to-sequence neural networks with extracted evidences as features. Experiments show that our extraction-then-synthesis method outperforms state-of-the-art methods.
In this paper, we focus on multiple-choice reading comprehension which aims to answer a question given a passage and multiple candidate options. We present the hierarchical attention flow to adequately leverage candidate options to model the interactions among passages, questions and candidate options. We observe that leveraging candidate options to boost evidence gathering from the passages play a vital role in this task, which is ignored in previous works. In addition, we explicitly model the option correlations with attention mechanism to obtain better option representations, which are further fed into a bilinear layer to obtain the ranking score for each option. On a large-scale multiple-choice reading comprehension dataset (i.e. the RACE dataset), the proposed model outperforms two previous neural network baselines on both RACE-M and RACE-H subsets and yields the state-of-the-art overall results.
The code has been copied to your clipboard. Machines equipped with artificial intelligence (AI) have performed better than human beings in a high-level test of reading comprehension. Two natural language processing tools received higher test scores than humans in recent exams. One of the tools is a product of the American software maker Microsoft. The other was created by the Chinese online seller Alibaba Group.