AITopics

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Iowa > Story County > Ames (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.82)

AAAI ConferencesJul-19-2011

Open Information Extraction: The Second Generation

Etzioni, Oren (University of Washington) | Fader, Anthony (University of Washington) | Christensen, Janara (University of Washington) | Soderland, Stephen (University of Washington) | Mausam, - (University of Washington)

How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews handlabeled training examples, and avoids domain-specific verbs and nouns, to develop unlexicalized, domain-independent extractors that scale to the Web corpus. Open IE systems have extracted billions of assertions as the basis for both common-sense knowledge and novel question-answering systems. This paper describes the second generation of Open IE systems, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE.

argument, extraction, relation phrase, (14 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

South America > Brazil (0.14)
Asia > Middle East > Iraq (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
(7 more...)

Genre: Research Report (0.34)

Industry: Government > Regional Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

AAAI ConferencesJul-12-2011

Natural Language Processing to the Rescue? Extracting "Situational Awareness" Tweets During Mass Emergency

In times of mass emergency, vast amounts of data are generated via computer-mediated communication (CMC) that are difficult to manually cull and organize into a coherent picture. Yet valuable information is broadcast, and can provide useful insight into time- and safety-critical situations if captured and analyzed properly and rapidly. We describe an approach for automatically identifying messages communicated via Twitter that contribute to situational awareness, and explain why it is beneficial for those seeking information during mass emergencies. We collected Twitter messages from four different crisis events of varying nature and magnitude and built a classifier to automatically detect messages that may contribute to situational awareness, utilizing a combination of hand-annotated and automatically-extracted linguistic features. Our system was able to achieve over 80% accuracy on categorizing tweets that contribute to situational awareness. Additionally, we show that a classifier developed for a specific emergency event performs well on similar events. The results are promising, and have the potential to aid the general public in culling and analyzing information communicated during times of mass emergency.

machine learning, natural language, tweet, (17 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > Canada > Manitoba (0.04)
North America > United States > Virginia (0.04)
(6 more...)

Industry: Government > Military (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Artificial IntelligenceJun-30-2011

Grounded Semantic Composition for Visual Scenes

Gorniak, P., Roy, D.

We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring expressions. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to natural language expressions for a large percentage of test cases. In an analysis of the system's successes and failures we reveal how visual context influences the semantics of utterances and propose future extensions to the model that take such context into account.

artificial intelligence, cone, natural language, (15 more...)

doi: 10.1613/jair.1327

1107.0031

Country:

North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Nederhof, M. J., Satta, G.

IDL-Expressions: A Formalism for Representing and Parsing Finite Languages in Natural Language Processing

arXiv.org Artificial IntelligenceJun-30-2011

Journal of Arti ial In telligen e Resear h 21 (2004) 287-317 Submitted 06/03; published 03/04 IDL-Expressions: A F ormalism for Represen ting and P arsing Finite Languages in Natural Language Pro essing Mark-Jan Nederhof markjan let.r ug.nl F a ulty of A rts, University of Gr oningen P.O. Dept. of Information Engine ering, University of Padua via Gr adenigo, 6/A I-35131 Padova, Italy Abstra t W e prop ose a formalism for represen tation of nite languages, referred to as the lass of IDL-expr essions, whi h om bines on epts that w ere only onsidered in isolation in existing formalisms. The suggested appli ations are in natural language pro essing, more sp e i ally in surfa e natural language generation and in ma hine translation, where a sen ten e is obtained b y rst generating a large set of andidate sen ten es, represen ted in a ompa t w a y, and then ltering su h a set through a parser. W e study sev eral formal prop erties of IDL-expressions and ompare this new formalism with more ...

artificial intelligence, idl-expression, natural language, (15 more...)

doi: 10.1613/jair.1309

1107.0026

Country: Europe > Italy (0.24)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.54)

arXiv.org Artificial IntelligenceJun-23-2011

Translation of Pronominal Anaphora between English and Spanish: Discrepancies and Evaluation

Ferrandez, A., Peral, J.

This paper evaluates the different tasks carried out in the translation of pronominal anaphora in a machine translation (MT) system. The MT interlingua approach named AGIR (Anaphora Generation with an Interlingua Representation) improves upon other proposals presented to date because it is able to translate intersentential anaphors, detect co-reference chains, and translate Spanish zero pronouns into English---issues hardly considered by other systems. The paper presents the resolution and evaluation of these anaphora problems in AGIR with the use of different kinds of knowledge (lexical, morphological, syntactic, and semantic). The translation of English and Spanish anaphoric third-person personal pronouns (including Spanish zero pronouns) into the target language has been evaluated on unrestricted corpora. We have obtained a precision of 80.4% and 84.8% in the translation of Spanish and English pronouns, respectively. Although we have only studied the Spanish and English languages, our approach can be easily extended to other languages such as Portuguese, Italian, or Japanese.

artificial intelligence, machine translation, natural language, (15 more...)

doi: 10.1613/jair.1115

1106.4862

Country:

North America > United States (1.00)
Europe > Spain (0.93)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

arXiv.org Artificial IntelligenceJun-22-2011

Acquiring Word-Meaning Mappings for Natural Language Interfaces

Thompson, C.

This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of phrases paired with meaning representations. WOLFIE is part of an integrated system that learns to transform sentences into representations such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The usefulness of the lexicons learned by WOLFIE are compared to those acquired by a similar system, with results favorable to WOLFIE. A second set of experiments demonstrates WOLFIE's ability to scale to larger and more difficult, albeit artificially generated, corpora. In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, most results to date for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to semantic lexicons. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance.

artificial intelligence, machine learning, natural language, (20 more...)

doi: 10.1613/jair.1063

1106.4571

Country:

Europe (1.00)
North America > United States > California (0.93)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

arXiv.org Artificial IntelligenceJun-9-2011

Parameter Learning of Logic Programs for Symbolic-Statistical Modeling

Sato, T., Kameya, Y.

We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. definite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distribution semantics, possible world semantics with a probability distribution which is unconditionally applicable to arbitrary logic programs including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM algorithm, the graphical EM algorithm, that runs for a class of parameterized logic programs representing sequential decision processes where each decision is exclusive and independent. It runs on a new data structure called support graphs describing the logical relationship between observations and their explanations, and learns parameters by computing inside and outside probability generalized for logic programs. The complexity analysis shows that when combined with OLDT search for all explanations for observations, the graphical EM algorithm, despite its generality, has the same time complexity as existing EM algorithms, i.e. the Baum-Welch algorithm for HMMs, the Inside-Outside algorithm for PCFGs, and the one for singly connected Bayesian networks that have been developed independently in each research field. Learning experiments with PCFGs using two corpora of moderate size indicate that the graphical EM algorithm can significantly outperform the Inside-Outside algorithm.

logic & formal reasoning, machine learning, natural language, (19 more...)

doi: 10.1613/jair.912

1106.1797

Country: Asia > Japan (0.28)

Genre:

Research Report (0.49)
Instructional Material (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(3 more...)

Paradesi, Sharon Myrtle (Massachusetts Institute of Technology)

Geotagging Tweets Using Their Content

AAAI ConferencesMay-18-2011

Harnessing rich, but unstructured information on social networks in real-time and showing it to relevant audience based on its geographic location is a major challenge. The system developed, TwitterTagger, geotags tweets and shows them to users based on their current physical location. Experimental validation shows a performance improvement of three orders by TwitterTagger compared to that of the baseline model.

noun phrase, tweet, twittertagger, (14 more...)

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > New York (0.06)
North America > United States > Oregon > Lane County > Springfield (0.05)
North America > United States > New Jersey (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Industry: Information Technology > Services (0.50)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.30)

AAAI ConferencesMay-18-2011

Co-Occurrence-Based Error Correction Approach to Word Segmentation

Chaowicharat, Ekawat (Mahidol University) | Naruedomkul, Kanlaya (Mahidol University)

To overcome the problems in Thai word segmentation, a number of word segmentation has been proposed during the long period of time until today. We propose a novel Thai word segmentation approach so called Co-occurrence-Based Error Correction (CBEC). CBEC generates all possible segmentation candidates using the classical maximal matching algorithm and then selects the most accurate segmentation based on co-occurrence and an error correction algorithm. CBEC was trained and evaluated on BEST 2009 corpus.

cbec, corpus, segmentation, (13 more...)

Twenty-Fourth International FLAIRS Conference

Country: Asia > Thailand > Bangkok > Bangkok (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Data Science > Data Quality > Data Cleaning (0.83)