Goto

Collaborating Authors

Representing Textual Content in a Generic Extraction Model

AAAI Conferences

The system described in this paper automatically extracts and stores information from documents. We have implemented a text processing system that uses shallow parsing techniques to extract information from sentences in text documents and stores frames of information in a knowledge base. We intend to use this system in two main application areas: open domain Question & Answering (Q&A) and specific domain information extraction.


Dependency-based Text Graphs for Keyphrase and Summary Extraction with Applications to Interactive Content Retrieval

arXiv.org Artificial Intelligence

We build a bridge between neural network-based machine learning and graph-based natural language processing and introduce a unified approach to keyphrase, summary and relation extraction by aggregating dependency graphs from links provided by a deep-learning based dependency parser. We reorganize dependency graphs to focus on the most relevant content elements of a sentence, integrate sentence identifiers as graph nodes and after ranking the graph, we extract our keyphrases and summaries from its largest strongly-connected component. We take advantage of the implicit structural information that dependency links bring to extract subject-verb-object, is-a and part-of relations. We put it all together into a proof-of-concept dialog engine that specializes the text graph with respect to a query and reveals interactively the document's most relevant content elements. The open-source code of the integrated system is available at https:// github.com/ptarau/DeepRank .


He

AAAI Conferences

Distant supervised relation extraction is an efficient approach to scale relation extraction to very large corpora, and has been widely used to find novel relational facts from plain text. Recent studies on neural relation extraction have shown great progress on this task via modeling the sentences in low-dimensional spaces, but seldom considered syntax information to model the entities. In this paper, we propose to learn syntax-aware entity embedding for neural relation extraction. First, we encode the context of entities on a dependency tree as sentence-level entity embedding based on tree-GRU. Then, we utilize both intra-sentence and inter-sentence attentions to obtain sentence set-level entity embedding over all sentences containing the focus entity pair. Finally, we combine both sentence embedding and entity embedding for relation classification. We conduct experiments on a widely used real-world dataset and the experimental results show that our model can make full use of all informative instances and achieve state-of-the-art performance of relation extraction.


A Bootstrapping Approach to Information Extraction Domain Porting

AAAI Conferences

This paper presents a seed-driven, bootstrapping approach to domain porting that could be used to customize a generic information extraction (IE) capability for a specific domain. The approach taken is based on the existence of a robust, domain-independent IE engine that can continue to be enhanced, independent of any particular domain. This approach combines the strengths of parsing-based symbolic rule learning and the high performance linear string-based Hidden Markov Model (HMM) to automatically derive a customized IE system with balanced precision and recall. The key idea is to apply precision-oriented symbolic rules learned in the first stage to a large corpus in order to construct an automatically tagged training corpus. This training corpus is then used to train an HMM to boost the recall. The experiments conducted in named entity (NE) tagging and relationship extraction show a performance close to the performance of supervised learning systems.


Building Integrated Opinion Delivery Environment

AAAI Conferences

We introduce a search engine and information retrieval system for providing access to opinion data. Natural language technology of generalization of syntactic parse trees is introduced as a similarity measure between subjects of textual opinions to link them on the fly. Information extraction algorithm for automatic summarization of web pages in the format of Google sponsored links is presented. We outline the usability of the implemented system, integrated opinion delivery environment (IODE).