Goto

Collaborating Authors

 Question Answering


Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension

arXiv.org Artificial Intelligence

We propose a simple method to generate large amounts of multilingual question and answer pairs by a single generative model. These synthetic samples are then applied to augment the available gold multilingual ones to improve the performance of multilingual QA models on target languages. Our approach only requires existence of automatically translated samples from English to the target domain, thus removing the need for human annotations in the target languages. Experimental results show our proposed approach achieves significant gains in a number of multilingual datasets.


Knowledge-enriched, Type-constrained and Grammar-guided Question Generation over Knowledge Bases

arXiv.org Artificial Intelligence

Question generation over knowledge bases (KBQG) aims at generating natural-language questions about a subgraph, i.e. a set of (connected) triples. Two main challenges still face the current crop of encoder-decoder-based methods, especially on small subgraphs: (1) low diversity and poor fluency due to the limited information contained in the subgraphs, and (2) semantic drift due to the decoder's oblivion of the semantics of the answer entity. We propose an innovative knowledge-enriched, type-constrained and grammar-guided KBQG model, named KTG, to addresses the above challenges. In our model, the encoder is equipped with auxiliary information from the KB, and the decoder is constrained with word types during QG. Specifically, entity domain and description, as well as relation hierarchy information are considered to construct question contexts, while a conditional copy mechanism is incorporated to modulate question semantics according to current word types. Besides, a novel reward function featuring grammatical similarity is designed to improve both generative richness and syntactic correctness via reinforcement learning. Extensive experiments show that our proposed model outperforms existing methods by a significant margin on two widely-used benchmark datasets SimpleQuestion and PathQuestion.


Now With IBM Watson Assistant Your Chatbot Can Learn Automatically

#artificialintelligence

On 19 August 2020 IBM Watson Assistant launched autolearning. The tagline from IBM is, Empower your skill to learn automatically with autolearning. This sounds very promising, and is indeed a step in the right direction. The big question of course is to what extend it learns automatically. For a full and detailed report on Watson Assistant's Disambiguation Function, I suggest this article: The ideal chatbot conversation is just that, conversation-like, in natural language and highly unstructured.


Knowledge Distillation for Improved Accuracy in Spoken Question Answering

arXiv.org Artificial Intelligence

Spoken question answering (SQA) is a challenging task that requires the machine to fully understand the complex spoken documents. Automatic speech recognition (ASR) plays a significant role in the development of QA systems. However, the recent work shows that ASR systems generate highly noisy transcripts, which critically limit the capability of machine comprehension on the SQA task. To address the issue, we present a novel distillation framework. Specifically, we devise a training strategy to perform knowledge distillation (KD) from spoken documents and written counterparts. Our work makes a step towards distilling knowledge from the language model as a supervision signal to lead to better student accuracy by reducing the misalignment between automatic and manual transcriptions. Experiments demonstrate that our approach outperforms several state-of-the-art language models on the Spoken-SQuAD dataset.


Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition

arXiv.org Artificial Intelligence

A booming amount of information is continuously added to the Internet as structured and unstructured data, feeding knowledge bases such as DBpedia and Wikidata with billions of statements describing millions of entities. The aim of Question Answering systems is to allow lay users to access such data using natural language without needing to write formal queries. However, users often submit questions that are complex and require a certain level of abstraction and reasoning to decompose them into basic graph patterns. In this short paper, we explore the use of architectures based on Neural Machine Translation called Neural SPARQL Machines to learn pattern compositions. We show that sequence-to-sequence models are a viable and promising option to transform long utterances into complex SPARQL queries.


Kwame: A Bilingual AI Teaching Assistant for Online SuaCode Courses

arXiv.org Artificial Intelligence

Introductory hands-on courses such as our smartphone-based coding courses, SuaCode require a lot of support for students to accomplish learning goals. Online environments make it even more difficult to get assistance especially more recently because of COVID-19. Given the multilingual context of our students (learners across 38 African countries), in this work, we developed an AI Teaching Assistant (Kwame) that provides answers to students' coding questions from our SuaCode courses in English and French. Kwame is a Sentence-BERT(SBERT)-based question-answering (QA) system that we trained and evaluated using question-answer pairs created from our course's quizzes and students' questions in past cohorts. It finds the paragraph most semantically similar to the question via cosine similarity. We compared the system with TF-IDF and Universal Sentence Encoder. Our results showed that SBERT performed the worst for the duration of 6 secs per question but the best for accuracy and fine-tuning on our course data improved the result.


Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering

arXiv.org Artificial Intelligence

Spoken conversational question answering (SCQA) requires machines to model complex dialogue flow given the speech utterances and text corpora. Different from traditional text question answering (QA) tasks, SCQA involves audio signal processing, passage comprehension, and contextual understanding. However, ASR systems introduce unexpected noisy signals to the transcriptions, which result in performance degradation on SCQA. To overcome the problem, we propose CADNet, a novel contextualized attention-based distillation approach, which applies both cross-attention and self-attention to obtain ASR-robust contextualized embedding representations of the passage and dialogue history for performance improvements. We also introduce the spoken conventional knowledge distillation framework to distill the ASR-robust knowledge from the estimated probabilities of the teacher model to the student. We conduct extensive experiments on the Spoken-CoQA dataset and demonstrate that our approach achieves remarkable performance in this task.


Unsupervised Deep Learning based Multiple Choices Question Answering: Start Learning from Basic Knowledge

arXiv.org Artificial Intelligence

In this paper, we study the possibility of almost unsupervised Multiple Choices Question Answering (MCQA). Starting from very basic knowledge, MCQA model knows that some choices have higher probabilities of being correct than the others. The information, though very noisy, guides the training of an MCQA model. The proposed method is shown to outperform the baseline approaches on RACE and even comparable with some supervised learning approaches on MC500.


IBM Watson Just Analysed a TV Debate. Read to Know How

#artificialintelligence

Bloomberg Television's show "That's Debatable" had an unusual participant on its show broadcasted on October 9. In a debate on the topic "Is it time to redistribute the world's wealth?", IBM Watson synthesised thousands of responses and opinions received from the public to incorporate into the debate. IBM Watson used a new natural language processing feature called key point analysis which categorises and summarises thousands of public opinions to a handful of concrete key points. Key point analysis is basically the next generation of'extractive summarisation' which processes statements in a given text document to summarise the most significant points.


Open Question Answering over Tables and Text

arXiv.org Artificial Intelligence

In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. Here we consider for the first time open QA over both tabular and textual data and present a new large-scale dataset Open Table-Text Question Answering (OTT-QA) to evaluate performance on this task. Most questions in OTT-QA require multi-hop inference across tabular data and unstructured text, and the evidence required to answer a question can be distributed in different ways over these two types of input, making evidence retrieval challenging---our baseline model using an iterative retriever and BERT-based reader achieves an exact match score less than 10%. We then propose two novel techniques to address the challenge of retrieving and aggregating evidence for OTT-QA. The first technique is to use "early fusion" to group multiple highly relevant tabular and textual units into a fused block, which provides more context for the retriever to search for. The second technique is to use a cross-block reader to model the cross-dependency between multiple retrieved evidences with global-local sparse attention. Combining these two techniques improves the score significantly, to above 27%.