"Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. This simplifies things for users in two ways. First, questions do not often translate into a simple list of keywords. ...Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents."
– from Bonnie Webber & Nick Webb. Question Answering. In The Handbook of Computational Linguistics and Natural Language Processing. Alexander Clark, Chris Fox, Shalom Lappin (Eds.). Wiley, 2010.
Years before everyone was being impressed with the human-like text output of ChatGPT and other generative AI systems, IBM's Watson was blowing our minds on Jeopardy. IBM's cognitive computing project famously dominated its human opponents, but the company had much larger long-term goals, such as using Watson's ability to simulate a human thought process to help doctors diagnose patients and recommend treatments. Now, IBM is pivoting its supercomputer platform into Watsonx, an AI development studio packed with foundation and open-source models companies can use to train their own AI platforms. If that sounds familiar, it may be because NVIDIA recently announced a similar service with its AI Foundations program. Both platforms are designed to give enterprises a way to build, train, scale and deploy an AI platform.
In early 2011, Ken Jennings looked like humanity's last hope. Watson, an artificial intelligence created by the tech giant IBM, had picked off lesser Jeopardy players before the show's all-time champ entered a three-day exhibition match. At the end of the first game, Watson--a machine the size of 10 refrigerators--had Jennings on the ropes, leading $35,734 to $4,800. On day three, Watson finished the job. "I for one welcome our new computer overlords," Jennings wrote on his video screen during Final Jeopardy. Watson was better than any previous AI at addressing a problem that had long stumped researchers: How do you get a computer to precisely understand a clue posed in idiomatic English and then spit out the correct answer (or, as in Jeopardy, the right question)?
There were three options for the course final project. Students either chose their own topic as a custom final project, or else they took part in one of the options for the default final projects, involving building question-answering systems. This year, we had two default final project options: Either people could build from scratch (regular, IID) question-answering models for the SQuAD 2.0 challenge or in the Robust QA track, students started with 3 question-answering datasets (SQuAD, Natural Questions, and NewsQA) and a pre-trained, transformer QA system and worked to produce a system that worked robustly on (OOD) test sets from additional domains. You can find links to previous years' reports under Previous Offerings on the homepage.
This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images. Our model performs 1.8 times better than the only published results on an existing image QA dataset. We also present a question generation algorithm that converts image descriptions, which are widely available, into QA form. We used this algorithm to produce an order-of-magnitude larger dataset, with more evenly distributed answers. A suite of baseline results on this new dataset are also presented.
The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.
We present an approach to map utterances in conversation to logical forms, which will be executed on a large-scale knowledge base. To handle enormous ellipsis phenomena in conversation, we introduce dialog memory management to manipulate historical entities, predicates, and logical forms when inferring the logical form of current utterances. Dialog memory management is embodied in a generative model, in which a logical form is interpreted in a top-down manner following a small and flexible grammar. We learn the model from denotations without explicit annotation of logical forms, and evaluate it on a large-scale dataset consisting of 200K dialogs over 12.8M entities. Results verify the benefits of modeling dialog memory, and show that our semantic parsing-based approach outperforms a memory network based encoder-decoder model by a huge margin.
This ignores the inherent graph structure of the knowledge base, and performs reasoning from facts to answer one at a time, which is computationally inefficient. Two entities have a connecting edge if they belong to the same fact. Strengths -- The proposed approach is intuitive, sufficiently novel, and outperforms prior work by a large margin -- 10% better than the previous best approach, which is an impressive result. Weaknesses -- Given that the fact retrieval step is still the bottleneck in terms of accuracy (Table 4), it would be useful to check how sensitive downstream accuracy is to the choice of retrieving 100 facts. What is the answering accuracy if 50 facts are retrieved?
Accurately answering a question about a given image requires combining observations with general knowledge. While this is effortless for humans, reasoning with general knowledge remains an algorithmic challenge. To advance research in this direction a novel'fact-based' visual question answering (FVQA) task has been introduced recently along with a large set of curated facts which link two entities, i.e., two possible answers, via a relation. Given a question-image pair, deep network techniques have been employed to successively reduce the large set of facts until one of the two entities of the final remaining fact is predicted as the answer. We observe that a successive process which considers one fact at a time to form a local decision is sub-optimal. Instead, we develop an entity graph and use a graph convolutional network to'reason' about the correct answer by jointly considering all entities. We show on the challenging FVQA dataset that this leads to an improvement in accuracy of around 7% compared to the state of the art.
Summary This paper presents a question-answering system based on tensor product representations. Given a latent sentence encoding, different MLPs extract entity and relation representations which are then used to update an tensor product representations of order-3 and trained end-to-end from the downstream success of correctly answering the question. Experiments are limited to bAbI question answering, which is disappointing as this is a synthetic corpus with a simple known underlying triples structure. While the proposed system outperforms baselines like recurrent entity networks (RENs) by a small difference in mean error, RENs have also been applied to more real-world tasks such as the Children's Book Test (CBT). Strengths - I like that the authors do not just report the best performance of their model, but also the mean and variance from five runs.