AITopics | Information Extraction

Collaborating Authors

Information Extraction

News Overviews Instructional Materials AI-Alerts Classics

Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback

Journal of Artificial Intelligence ResearchNov-17-2010

While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author's mood, gender, age, or sentiment. Without knowing the user's intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address the problem of clustering documents along the user-desired dimension, previous work has focused on learning a similarity metric from data manually annotated with the user's intention or having a human construct a feature space in an interactive manner during the clustering process. With the goal of reducing reliance on human knowledge for fine-tuning the similarity function or selecting the relevant features required by these approaches, we propose a novel active clustering algorithm, which allows a user to easily select the dimension along which she wants to cluster the documents by inspecting only a small number of words. We demonstrate the viability of our algorithm on a variety of commonly-used sentiment datasets.

algorithm, dimension, eigenvector, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3003

AI Access Foundation

10677

Journal of Artificial Intelligence Research

Country:

North America > Central America (0.14)
Asia > Middle East > Jordan (0.04)
South America (0.04)
(7 more...)

Genre:

Overview (0.93)
Research Report > New Finding (0.93)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
(2 more...)

Add feedback

SenticNet: A Publicly Available Semantic Resource for Opinion Mining

Cambria, Erik (University of Stirling) | Speer, Robyn (Massachusetts Institute of Technology) | Havasi, Catherine (Massachusetts Institute of Technology) | Hussain, Amir (University of Stirling)

AAAI ConferencesNov-5-2010

Today millions of web-users express their opinions about many topics through blogs, wikis, fora, chats and social networks. For sectors such as e-commerce and e-tourism, it is very useful to automatically analyze the huge amount of social information available on the Web, but the extremely unstructured nature of these contents makes it a difficult task. SenticNet is a publicly available resource for opinion mining built exploiting AI and Semantic Web techniques. It uses dimensionality reduction to infer the polarity of common sense concepts and hence provide a public resource for mining opinions from natural language text at a semantic, rather than just syntactic, level.

artificial intelligence, natural language, text processing, (18 more...)

AAAI Conferences

2010 AAAI Fall Symposium Series

Country:

Europe > United Kingdom (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York (0.04)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.87)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.85)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.73)

Add feedback

Adapting Open Information Extraction to Domain-Specific Relations

AI MagazineOct-10-2010

Information extraction (IE) can identify a set of relations from free text to support question answering (QA). Until recently, IE systems were domain-specific and needed a combination of manual engineering and supervised learning to adapt to each target domain. A new paradigm, Open IE operates on large text corpora without any manual tagging of relations, and indeed without any pre-specified relations. We explore the steps needed to adapt Open IE to a domain-specific ontology and demonstrate our approach of mapping domain-independent tuples to an ontology using domains from DARPA's Machine Reading Project.

artificial intelligence, inductive learning, Open Information Extraction, (9 more...)

AI Magazine

Industry:

Government > Regional Government > North America Government > US Government (0.67)
Government > Military (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

Add feedback

Adapting Open Information Extraction to Domain-Specific Relations

AI MagazineOct-10-2010

Information extraction (IE) can identify a set of relations from free text to support question answering (QA). Until recently, IE systems were domain-specific and needed a combination of manual engineering and supervised learning to adapt to each target domain. A new paradigm, Open IE operates on large text corpora without any manual tagging of relations, and indeed without any pre-specified relations. Due to its open-domain and open-relation nature, Open IE is purely textual and is unable to relate the surface forms to an ontology, if known in advance. We explore the steps needed to adapt Open IE to a domain-specific ontology and demonstrate our approach of mapping domain-independent tuples to an ontology using domains from DARPA’s Machine Reading Project. Our system achieves precision over 0.90 from as few as 8 training examples for an NFL-scoring domain.

machine learning, natural language, relation, (19 more...)

AI Magazine

Country:

North America > United States > California (0.15)
North America > United States > New York (0.14)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Government > Military (0.69)
Government > Regional Government > North America Government > United States Government (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.76)

Add feedback

Sentiment Extraction: Integrating Statistical Parsing, Semantic Analysis, and Common Sense Reasoning

Shastri, Lokendra (Infosys Technologies Limited) | Parvathy, Anju G. (Infosys Technologies Limited) | Kumar, Abhishek (Infosys Technologies Limited) | Wesley, John (Infosys Technologies Limited) | Blakrishnan, Rajesh (Infosys Technologies Limited)

AAAI ConferencesJul-15-2010

Much of the ongoing explosion of digital content is in the form of text. This content is a virtual gold-mine of information that can inform a range of social, governmental, and business decisions. For example, using content available on blogs and social networking sites businesses can find out what its customers are saying about their products and services. In the digital age where customer is king, the business value of ascertaining consumer sentiment cannot be overstated. People express sentiments in myriad ways. At times, they use simple, direct assertions, but most often they use sentences involving comparisons, conjunctions expressing multiple and possibly opposing sentiments about multiple features and entities,and pronominal references whose resolution requires discourse level context. Frequently people use abbreviations, slang, SMSese, idioms and metaphors. Understanding the latter also requires common sense reasoning. In this paper, we present iSEE, a fully implemented sentiment extraction engine, which makes use of statistical methods, classical NLU techniques, common sense reasoning, and probabilistic inference to extract entity and feature specific sentiment from complex sentences and dialog. Most of the components of iSEE are domain independent and the system can be generalized to new domains by simply adding domain relevant lexicons.

artificial intelligence, natural language, sentiment, (17 more...)

AAAI Conferences

Twenty-Second IAAI Conference

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Temporal Information Extraction

Ling, Xiao (University of Washington) | Weld, Daniel S. (University of Washington)

AAAI ConferencesJul-15-2010

Research on information extraction (IE) seeks to distill relational tuples from natural language text, such as the contents of the WWW. Most IE work has focussed on identifying static facts, encoding them as binary relations. This is unfortunate, because the vast majority of facts are fluents, only holding true during an interval of time. It is less helpful to extract PresidentOf(Bill-Clinton, USA) without the temporal scope 1/20/93 — 1/20/01. This paper presents TIE, a novel, information-extraction system, which distills facts from text while inducing as much temporal information as possible. In addition to recognizing temporal relations between times and events, TIE performs global inference, enforcing transitivity to bound the start and ending times for each event. We introduce the notion of temporal entropy as a way to evaluate the performance of temporal IE systems and present experiments showing that TIE outperforms three alternative approaches.

data mining, natural language, relation, (16 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Washington > King County > Seattle (0.14)
Oceania > Australia (0.04)

Genre: Research Report (0.46)

Industry: Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.82)

Add feedback

Sentiment Analysis with Global Topics and Local Dependency

Li, Fangtao (Tsinghua University) | Huang, Minlie (Tsinghua University) | Zhu, Xiaoyan (Tsinghua University)

AAAI ConferencesJul-15-2010

With the development of Web 2.0, sentiment analysis has now become a popular research problem to tackle. Recently, topic models have been introduced for the simultaneous analysis for topics and the sentiment in a document. These studies, which jointly model topic and sentiment, take the advantage of the relationship between topics and sentiment, and are shown to be superior to traditional sentiment analysis tools. However, most of them make the assumption that, given the parameters, the sentiments of the words in the document are all independent. In our observation, in contrast, sentiments are expressed in a coherent way. The local conjunctive words, such as “and” or “but”, are often indicative of sentiment transitions. In this paper, we propose a major departure from the previous approaches by making two linked contributions. First, we assume that the sentiments are related to the topic in the document, and put forward a joint sentiment and topic model, i.e. Sentiment-LDA. Second, we observe that sentiments are dependent on local context. Thus, we further extend the Sentiment-LDA model to Dependency-Sentiment-LDA model by relaxing the sentiment independent assumption in Sentiment-LDA. The sentiments of words are viewed as a Markov chain in Dependency-Sentiment-LDA. Through experiments, we show that exploiting the sentiment dependency is clearly advantageous, and that the Dependency-Sentiment-LDA is an effective approach for sentiment analysis.

machine learning, natural language, sentiment, (17 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
North America > Canada (0.04)
(2 more...)

Industry:

Leisure & Entertainment (0.68)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Prioritization of Domain-Specific Web Information Extraction

Huang, Jian (Pennsylvania State University) | Yu, Cong (Yahoo! Research)

AAAI ConferencesJul-15-2010

It is often desirable to extract structured information from raw web pages for better information browsing, query answering, and pattern mining. many such Information Extraction (IE) technologies are costly and applying them at the web-scale is impractical. In this paper, we propose a novel prioritization approach where candidate pages from the corpus are ordered according to their expected contribution to the extraction results and those with higher estimated potential are extracted earlier. Systems employing this approach can stop the extraction process at any time when the resource gets scarce (i.e., not all pages in the corpus can be processed), without worrying about wasting extraction effort on unimportant pages. More specifically, we define a novel notion to measure the value of extraction results and design various mechanisms for estimating a candidate page’s contribution to this value. We further design and build the Extraction Prioritization (EP) system with efficient scoring and scheduling algorithms, and experimentally demonstrate that EP significantly outperforms the naive approach and is more flexible than the classifier approach.

data mining, extraction, natural language, (17 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > New York > New York County > New York City (0.04)

Industry: Consumer Products & Services > Restaurants (0.69)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.62)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.62)

Add feedback

Toward an Architecture for Never-Ending Language Learning

Carlson, Andrew (Carnegie Mellon University) | Betteridge, Justin (Carnegie Mellon University) | Kisiel, Bryan (Carnegie Mellon University) | Settles, Burr (Carnegie Mellon University) | Hruschka, Estevam R. (Federal University of Sao Carlos) | Mitchell, Tom M. (Carnegie Mellon University)

AAAI ConferencesJul-15-2010

We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on the previous day. In particular, we propose an approach and a set of design principles for such an agent, describe a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs with an estimated precision of 74% after running for 67 days, and discuss lessons learned from this preliminary attempt to build a never-ending learning agent.

machine learning, natural language, predicate, (19 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
(6 more...)

Industry:

Education > Curriculum > Subject-Specific Education (0.50)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
(2 more...)

Add feedback

Bidirectional Integration of Pipeline Models

Yu, Xiaofeng (The Chinese University of Hong Kong) | Lam, Wai (The Chinese University of Hong Kong)

AAAI ConferencesJul-15-2010

Traditional information extraction systems adopt pipeline strategies, which are highly ineffective and suffer from several problems such as error propagation. Typically, pipeline models fail to produce highly-accurate final output. On the other hand, there has been growing interest in integrated or joint models which explore mutual benefits and perform multiple subtasks simultaneously to avoid problems caused by pipeline models. However, building such systems usually increases computational complexity and requires considerable engineering. This paper presents a general, strongly-coupled, and bidirectional architecture based on discriminatively trained factor graphs for information extraction. First we introduce joint factors connecting variables of relevant subtasks to capture dependencies and interactions between them. We then propose a strong bidirectional MCMC sampling inference algorithm which allows information to flow in both directions to find the approximate MAP solution for all subtasks. Extensive experiments on entity identification and relation extraction using real-world data illustrate the promise of our approach.

artificial intelligence, machine learning, natural language, (20 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback