Goto

Collaborating Authors

 IBM Research AI


Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph

AAAI Conferences

While conversing with chatbots, humans typically tend to ask many questions, a significant portion of which can be answered by referring to large-scale knowledge graphs (KG). While Question Answering (QA) and dialog systems have been studied independently, there is a need to study them closely to evaluate such real-world scenarios faced by bots involving both these tasks. Towards this end, we introduce the task of Complex Sequential QA which combines the two tasks of (i) answering factual questions through complex inferencing over a realistic-sized KG of millions of entities, and (ii) learning to converse through a series of coherently linked QA pairs. Through a labor intensive semi-automatic process, involving in-house and crowdsourced workers, we created a dataset containing around 200K dialogs with a total of 1.6M turns. Further, unlike existing large scale QA datasets which contain simple questions that can be answered from a single tuple, the questions in our dialogs require a larger subgraph of the KG. Specifically, our dataset has questions which require logical, quantitative, and comparative reasoning as well as their combinations. This calls for models which can: (i) parse complex natural language questions, (ii) use conversation context to resolve coreferences and ellipsis in utterances, (iii) ask for clarifications for ambiguous queries, and finally (iv) retrieve relevant subgraphs of the KG to answer such questions. However, our experiments with a combination of state of the art dialog and QA models show that they clearly do not achieve the above objectives and are inadequate for dealing with such complex real world settings. We believe that this new dataset coupled with the limitations of existing models as reported in this paper should encourage further research in Complex Sequential QA.


Democratization of Deep Learning Using DARVIZ

AAAI Conferences

With an abundance of research papers in deep learning, adoption and reproducibility of existing works becomes a challenge. To make a DL developer life easy, we propose a novel system, DARVIZ, to visually design a DL model using a drag-and-drop framework in an platform agnostic manner. The code could be automatically generated in both Caffe and Keras. DARVIZ could import (i) any existing Caffe code, or (ii) a research paper containing a DL design; extract the design, and present it in visual editor.


Hi, How Can I Help You?: Automating Enterprise IT Support Help Desks

AAAI Conferences

Question answering is one of the primary challenges of natural language understanding. In realizing such a system, providing complex long answers to questions is a challenging task as opposed to factoid answering as the former needs context disambiguation. The different methods explored in the literature can be broadly classified into three categories namely: 1) classification based, 2) knowledge graph based and 3) retrieval based. Individually, none of them address the need of an enterprise wide assistance system for an IT support and maintenance domain. In this domain, the variance of answers is large ranging from factoid to structured operating procedures; the knowledge is present across heterogeneous data sources like application specific documentation, ticket management systems and any single technique for a general purpose assistance is unable to scale for such a landscape. To address this, we have built a cognitive platform with capabilities adopted for this domain. Further, we have built a general purpose question answering system leveraging the platform that can be instantiated for multiple products, technologies in the support domain. The system uses a novel hybrid answering model that orchestrates across a deep learning classifier, a knowledge graph based context disambiguation module and a sophisticated bag-of-words search system. This orchestration performs context switching for a provided question and also does a smooth hand-off of the question to a human expert if none of the automated techniques can provide a confident answer. This system has been deployed across 675 internal enterprise IT support and maintenance projects.


Agent Assist: Automating Enterprise IT Support Help Desks

AAAI Conferences

In this paper, we present Agent Assist, a virtual assistant which helps IT support staff to resolve tickets faster. It is essentially a conversation system which provides procedural and often complex answers to queries. This system can ingest knowledge from various sources like application documentation, ticket management systems and knowledge transfer video recordings. It uses an ensemble of techniques like question classification, knowledge graph based disambiguation, information retrieval, etc., to provide quick and relevant solutions to problems from various technical domains and is currently being used in more than 650 projects within IBM.


Neural Cross-Lingual Entity Linking

AAAI Conferences

A major challenge in Entity Linking (EL) is making effective use of contextual information to disambiguate mentions to Wikipedia that might refer to different entities in different contexts. The problem exacerbates with cross-lingual EL which involves linking mentions written in non-English documents to entries in the English Wikipedia: to compare textual clues across languages we need to compute similarity between textual fragments across languages. In this paper, we propose a neural EL model that trains fine-grained similarities and dissimilarities between the query and candidate document from multiple perspectives, combined with convolution and tensor networks. Further, we show that this English-trained system can be applied, in zero-shot learning, to other languages by making surprisingly effective use of multi-lingual embeddings. The proposed system has strong empirical evidence yielding state-of-the-art results in English as well as cross-lingual: Spanish and Chinese TAC 2015 datasets.


DLPaper2Code: Auto-Generation of Code From Deep Learning Research Papers

AAAI Conferences

With an abundance of research papers in deep learning, reproducibility or adoption of the existing works becomes a challenge. This is due to the lack of open source implementations provided by the authors. Even if the source code is available, then re-implementing research papers in a different library is a daunting task. To address these challenges, we propose a novel extensible approach, DLPaper2Code, to extract and understand deep learning design flow diagrams and tables available in a research paper and convert them to an abstract computational graph. The extracted computational graph is then converted into execution ready source code in both Keras and Caffe, in real-time. An arXiv-like website is created where the automatically generated designs is made publicly available for 5,000 research papers. The generated designs could be rated and edited using an intuitive drag-and-drop UI framework in a crowd sourced manner. To evaluate our approach, we create a simulated dataset with over 216,000 valid deep learning design flow diagrams using a manually defined grammar. Experiments on the simulated dataset show that the proposed framework provide more than 93% accuracy in flow diagram content extraction.


EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples

AAAI Conferences

Recent studies have highlighted the vulnerability of deep neural networks (DNNs) to adversarial examples — a visually indistinguishable adversarial image can easily be crafted to cause a well-trained model to misclassify. Existing methods for crafting adversarial examples are based on L 2 and L ∞ distortion metrics. However, despite the fact that L 1 distortion accounts for the total variation and encourages sparsity in the perturbation, little has been developed for crafting L 1 -based adversarial examples. In this paper, we formulate the process of attacking DNNs via adversarial examples as an elastic-net regularized optimization problem. Our elastic-net attacks to DNNs (EAD) feature L 1 -oriented adversarial examples and include the state-of-the-art  L 2 attack as a special case. Experimental results on MNIST, CIFAR10 and ImageNet show that EAD can yield a distinct set of adversarial examples with small  L 1 distortion and attains similar attack performance to the state-of-the-art methods in different attack scenarios. More importantly, EAD leads to improved attack transferability and complements adversarial training for DNNs, suggesting novel insights on leveraging  L 1 distortion in adversarial machine learning and security implications of DNNs.


R 3 : Reinforced Ranker-Reader for Open-Domain Question Answering

AAAI Conferences

In recent years researchers have achieved considerable success applying neural network methods to question answering (QA). These approaches have achieved state of the art results in simplified closed-domain settings such as the SQuAD (Rajpurkar et al. 2016) dataset, which provides a pre-selected passage, from which the answer to a given question may be extracted. More recently, researchers have begun to tackle open-domain QA, in which the model is given a question and access to a large corpus (e.g., wikipedia) instead of a pre-selected passage (Chen et al. 2017a). This setting is more complex as it requires large-scale search for relevant passages by an information retrieval component, combined with a reading comprehension model that “reads” the passages to generate an answer to the question. Performance in this setting lags well behind closed-domain performance. In this paper, we present a novel open-domain QA system called Reinforced Ranker-Reader (R 3 ), based on two algorithmic innovations. First, we propose a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of extracting the ground-truth answer to a given question. Second, we propose a novel method that jointly trains the Ranker along with an answer-extraction Reader model, based on reinforcement learning. We report extensive experimental results showing that our method significantly improves on the state of the art for multiple open-domain QA datasets.


Dynamic Determinantal Point Processes

AAAI Conferences

The determinantal point process (DPP) has been receiving increasing attention in machine learning as a generative model of subsets consisting of relevant and diverse items. Recently, there has been a significant progress in developing efficient algorithms for learning the kernel matrix that characterizes a DPP. Here, we propose a dynamic DPP, which is a DPP whose kernel can change over time, and develop efficient learning algorithms for the dynamic DPP. In the dynamic DPP, the kernel depends on the subsets selected in the past, but we assume a particular structure in the dependency to allow efficient learning. We also assume that the kernel has a low rank and exploit a recently proposed learning algorithm for the DPP with low-rank factorization, but also show that its bottleneck computation can be reduced from O ( M 2 K ) time to O ( M K 2 ) time, where M is the number of items under consideration, and K is the rank of the kernel, which can be set smaller than M by orders of magnitude.


The Conference Paper Assignment Problem: Using Order Weighted Averages to Assign Indivisible Goods

AAAI Conferences

We propose a novel mechanism for solving the assignment problem when we have a two sided matching problem with preferences from one side (the agents/reviewers) over the other side (the objects/papers) and both sides have capacity constraints. The assignment problem is a fundamental in both computer science and economics with application in many areas including task and resource allocation. Drawing inspiration from work in multi-criteria decision making and social choice theory we use order weighted averages (OWAs), a parameterized class of mean aggregators, to propose a novel and flexible class of algorithms for the assignment problem. We show an algorithm for finding an SUM-OWA assignment in polynomial time, in contrast to the NP-hardness of finding an egalitarian assignment. We demonstrate through empirical experiments that using SUM-OWA assignments can lead to high quality and more fair assignments.