Goto

Collaborating Authors

 Question Answering


Interpretable Question Answering on Knowledge Bases and Text

arXiv.org Artificial Intelligence

Interpretability of machine learning (ML) models becomes more relevant with their increasing adoption. In this work, we address the interpretability of ML based question answering (QA) models on a combination of knowledge bases (KB) and text documents. We adapt post hoc explanation methods such as LIME and input perturbation (IP) and compare them with the self-explanatory attention mechanism of the model. For this purpose, we propose an automatic evaluation paradigm for explanation methods in the context of QA. We also conduct a study with human annotators to evaluate whether explanations help them identify better QA models. Our results suggest that IP provides better explanations than LIME or attention, according to both automatic and human evaluation. We obtain the same ranking of methods in both experiments, which supports the validity of our automatic evaluation paradigm.



Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects

arXiv.org Machine Learning

Visual question answering (VQA) models have been shown to over-rely on linguistic biases in VQA datasets, answering questions "blindly" without considering visual context. Adversarial regularization (AdvReg) aims to address this issue via an adversary sub-network that encourages the main model to learn a bias-free representation of the question. In this work, we investigate the strengths and shortcomings of AdvReg with the goal of better understanding how it affects inference in VQA models. Despite achieving a new state-of-the-art on VQA-CP, we find that AdvReg yields several undesirable side-effects, including unstable gradients and sharply reduced performance on in-domain examples. We demonstrate that gradual introduction of regularization during training helps to alleviate, but not completely solve, these issues. Through error analyses, we observe that AdvReg improves generalization to binary questions, but impairs performance on questions with heterogeneous answer distributions. Qualitatively, we also find that regularized models tend to over-rely on visual features, while ignoring important linguistic cues in the question. Our results suggest that AdvReg requires further refinement before it can be considered a viable bias mitigation technique for VQA.


Women Leaders in AI: Gail Blum IBM Watson

#artificialintelligence

How are you using Watson in your business? We wanted to improve the candidate experience by creating interactions with job seekers visiting our career site, as well as increase the number of applications we receive for hard-to-fill roles. Watson Candidate Assistant answers general questions about working at NBCUniversal, and it recommends jobs based on keyword matching between openings and the job seeker's resume. Candidates using a traditional job search may look by functional areas or job titles, but that might not match our company's vernacular. We can now drive candidates to roles they might not have found.


IBM's Watson Studio AutoAI automates enterprise AI model development

#artificialintelligence

Deploying AI-imbued apps and services isn't as challenging as it used to be, thanks to offerings like IBM's Watson Studio (previously Data Science Experience). Watson Studio, which debuted in 2017 after a 12-month beta period, provides an environment and tools that help to analyze, visualize, cleanse, and shape data; to ingest streaming data; and to train and optimize machine learning models in real time. And today, it's becoming even more capable with the launch of AutoAI, a set of features designed to automate tasks associated with orchestrating AI in enterprise environments. "IBM has been working closely with clients as they chart their paths to AI, and one of the first challenges many face is data prep -- a foundational step in AI," said general manager of IBM Data and AI Rob Thomas in a statement. "We have seen that complexity of data infrastructures can be daunting to the most sophisticated companies, but it can be overwhelming for those with little to no technical resources. The automation capabilities we're putting Watson Studio are designed to smooth the process and help clients start building machine learning models and experiments faster."


Unsupervised Question Answering by Cloze Translation

arXiv.org Artificial Intelligence

Obtaining training data for Question Answering (QA) is time-consuming and resource-intensive, and existing QA datasets are only available for limited domains and languages. In this work, we explore to what extent high quality training data is actually required for Extractive QA, and investigate the possibility of unsupervised Extractive QA. We approach this problem by first learning to generate context, question and answer triples in an unsupervised manner, which we then use to synthesize Extractive QA training data automatically. To generate such triples, we first sample random context paragraphs from a large corpus of documents and then random noun phrases or named entity mentions from these paragraphs as answers. Next we convert answers in context to "fill-in-the-blank" cloze questions and finally translate them into natural questions. We propose and compare various unsupervised ways to perform cloze-to-natural question translation, including training an unsupervised NMT model using non-aligned corpora of natural questions and cloze questions as well as a rule-based approach. We find that modern QA models can learn to answer human questions surprisingly well using only synthetic training data. We demonstrate that, without using the SQuAD training data at all, our approach achieves 56.4 F1 on SQuAD v1 (64.5 F1 when the answer is a Named entity mention), outperforming early supervised models.


Multi-hop Reading Comprehension through Question Decomposition and Rescoring

arXiv.org Artificial Intelligence

Multi-hop Reading Comprehension (RC) requires reasoning and aggregation across several paragraphs. We propose a system for multi-hop RC that decomposes a compositional question into simpler sub-questions that can be answered by off-the-shelf single-hop RC models. Since annotations for such decomposition are expensive, we recast sub-question generation as a span prediction problem and show that our method, trained using only 400 labeled examples, generates sub-questions that are as effective as human-authored sub-questions. We also introduce a new global rescoring approach that considers each decomposition (i.e. the sub-questions and their answers) to select the best final answer, greatly improving overall performance. Our experiments on HotpotQA show that this approach achieves the state-of-the-art results, while providing explainable evidence for its decision making in the form of sub-questions.


Improving Question Answering over Incomplete KBs with Knowledge-Aware Reader

arXiv.org Artificial Intelligence

We propose a new end-to-end question answering model, which learns to aggregate answer evidence from an incomplete knowledge base (KB) and a set of retrieved text snippets. Under the assumptions that the structured KB is easier to query and the acquired knowledge can help the understanding of unstructured text, our model first accumulates knowledge of entities from a question-related KB subgraph; then reformulates the question in the latent space and reads the texts with the accumulated entity knowledge at hand. The evidence from KB and texts are finally aggregated to predict answers. On the widely-used KBQA benchmark WebQSP, our model achieves consistent improvements across settings with different extents of KB incompleteness.


DiffQue: Estimating Relative Difficulty of Questions in Community Question Answering Services

arXiv.org Machine Learning

Automatic estimation of relative difficulty of a pair of questions is an important and challenging problem in community question answering (CQA) services. There are limited studies which addressed this problem. Past studies mostly leveraged expertise of users answering the questions and barely considered other properties of CQA services such as metadata of users and posts, temporal information and textual content. In this paper, we propose DiffQue, a novel system that maps this problem to a network-aided edge directionality prediction problem. DiffQue starts by constructing a novel network structure that captures different notions of difficulties among a pair of questions. It then measures the relative difficulty of two questions by predicting the direction of a (virtual) edge connecting these two questions in the network. It leverages features extracted from the network structure, metadata of users/posts and textual description of questions and answers. Experiments on datasets obtained from two CQA sites (further divided into four datasets) with human annotated ground-truth show that DiffQue outperforms four state-of-the-art methods by a significant margin (28.77% higher F1 score and 28.72% higher AUC than the best baseline). As opposed to the other baselines, (i) DiffQue appropriately responds to the training noise, (ii) DiffQue is capable of adapting multiple domains (CQA datasets), and (iii) DiffQue can efficiently handle 'cold start' problem which may arise due to the lack of information for newly posted questions or newly arrived users.


The State of Voice Search: Staying Ahead of the Rapidly-Growing Channel

#artificialintelligence

Analysts may disagree on the specific numbers, but one thing is abundantly clear--we're right in the thick of a voice revolution. Whether it's through Alexa, Siri, Google, or any other digital assistant, voice search has become an integral part of daily life for millions of people. And marketers can't be content to sit back and see how this trend plays out. Creating a voice strategy has quickly become a necessity rather than a luxury. But if you're at square one, it's easy for the "where do I start?" mentality to set in.