AITopics

Intelligent Input Methods (IM) are essential for making text entries in many East Asian scripts, but their application to other languages has not been fully explored. This paper discusses how such tools can contribute to the deve lopment of computer processing of other oriental languages. We propose a design philosophy that regards IM as a text service platform, and treats the study of IM as a cross disciplinary subject from the perspectives of software engineering, human - computer interaction (HCI), and natural language processing (NLP). We discuss these three perspectives and indicate a number of possible future research directions.

artificial intelligence, natural language, text processing, (18 more...)

0704.3665

Country:

Asia (0.47)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Random Sentences from a Generalized Phrase-Structure Grammar Interpreter

Dale, Rick

In numerous domains in cognitive science it is often useful to have a source for randomly generated corpora. These corpora may serve as a foundation for artificial stimuli in a learning experiment (e.g., Ellefson & Christiansen, 2000), or as input into computational models (e.g., Christiansen & Dale, 2001). The following compact and general C program interprets a phrasestructure grammar specified in a text file. It follows parameters set at a Unix or Unix-based command-line and generates a corpus of random sentences from that grammar. The first and required input into the program is a file that contains a phrase-structure grammar description (see below).

artificial intelligence, natural language, string, (17 more...)

cs/0702081

Country: North America > United States (0.30)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.72)

Dependency Treebanks: Methods, Annotation Schemes and Tools

Kakkonen, Tuomo

In this paper, current dependency-based treebanks are introduced and analyzed. The methods used for building the resources, the annotation schemes applied, and the tools used (such as POS taggers, parsers and annotation software) are discussed.

artificial intelligence, natural language, treebank, (15 more...)

cs/0610124

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Kakkonen, Tuomo, Myller, Niko, Sutinen, Erkki

Applying Part-of-Seech Enhanced LSA to Automatic Essay Grading

Latent Semantic Analysis (LSA) is a widely used Information Retrieval method based on "bag-of-words" assumption. However, according to general conception, syntax plays a role in representing meaning of sentences. Thus, enhancing LSA with part-of-speech (POS) information to capture the context of word occurrences appears to be theoretically feasible extension. The approach is tested empirically on a automatic essay grading system using LSA for document similarity comparisons. A comparison on several POS-enhanced LSA models is reported. Our findings show that the addition of contextual information in the form of POS tags can raise the accuracy of the LSA-based scoring models up to 10.77 per cent.

artificial intelligence, natural language, pos tag, (19 more...)

cs/0610118

Country:

Europe (1.00)
North America > United States (0.95)

Genre: Research Report > New Finding (0.68)

Industry:

Education > Assessment & Standards > Student Performance (0.50)
Education > Educational Technology (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

DepAnn - An Annotation Tool for Dependency Treebanks

Kakkonen, Tuomo

DepAnn is an interactive annotation tool for dependency treebanks, providing both graphical and text-based annotation interfaces. The tool is aimed for semi-automatic creation of treebanks. It aids the manual inspection and correction of automatically created parses, making the annotation process faster and less error-prone. A novel feature of the tool is that it enables the user to view outputs from several parsers as the basis for creating the final tree to be saved to the treebank. DepAnn uses TIGER-XML, an XML-based general encoding format for both, representing the parser outputs and saving the annotated treebank. The tool includes an automatic consistency checker for sentence structures. In addition, the tool enables users to build structures manually, add comments on the annotations, modify the tagsets, and mark sentences for further revision.

artificial intelligence, natural language, treebank, (16 more...)

cs/0610116

Country:

Europe (1.00)
North America > United States > Indiana (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)

Aubin, Sophie, Hamon, Thierry

Improving Term Extraction with Terminological Resources

Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. The difficulty or impossibility of customising them to new domains is an additional limitation. In this paper, we propose to use external terminologies to influence generic linguistic data in order to augment the quality of the extraction. The tool we implemented exploits testified terms at different steps of the process: chunking, parsing and extraction of term candidates. Experiments reported here show that, using this method, more term candidates can be acquired with a higher level of reliability. We further describe the extraction process involving endogenous disambiguation implemented in the term extractor YaTeA.

artificial intelligence, natural language, term candidate, (17 more...)

cs/0609019

Country: Europe > France (0.28)

Genre:

Research Report (0.70)
Workflow (0.48)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.57)

Pyysalo, Sampo, Salakoski, Tapio, Aubin, Sophie, Nazarenko, Adeline

Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches

We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error. The adapted parser is available under an open-source license at http://www.it.utu.fi/biolg .

artificial intelligence, extension, natural language, (19 more...)

cs/0606119

Country:

Europe (0.93)
North America > United States (0.46)

Genre: Research Report > Experimental Study (0.47)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.96)

Gottlob, Georg, Koch, Christoph, Schulz, Klaus U.

Conjunctive Queries over Trees

We study the complexity and expressive power of conjunctive queries over unranked labeled trees represented using a variety of structure relations such as ``child'', ``descendant'', and ``following'' as well as unary relations for node labels. We establish a framework for characterizing structures representing trees for which conjunctive queries can be evaluated efficiently. Then we completely chart the tractability frontier of the problem and establish a dichotomy theorem for our axis relations, i.e., we find all subset-maximal sets of axes for which query evaluation is in polynomial time and show that for all other cases, query evaluation is NP-complete. All polynomial-time results are obtained immediately using the proof techniques from our framework. Finally, we study the expressiveness of conjunctive queries over trees and show that for each conjunctive query, there is an equivalent acyclic positive query (i.e., a set of acyclic conjunctive queries), but that in general this query is not of polynomial size.

artificial intelligence, information management, natural language, (18 more...)

cs/0602004

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

Melamed, I. Dan, Wang, Wei

Statistical Machine Translation by Generalized Parsing

Designers of statistical machine translation (SMT) systems have begun to employ tree-structured translation models. Systems involving tree-structured translation models tend to be complex. This article aims to reduce the conceptual complexity of such systems, in order to make them easier to design, implement, debug, use, study, understand, explain, modify, and improve. In service of this goal, the article extends the theory of semiring parsing to arrive at a novel abstract parsing algorithm with five functional parameters: a logic, a grammar, a semiring, a search strategy, and a termination condition. The article then shows that all the common algorithms that revolve around tree-structured translation models, including hierarchical alignment, inference for parameter estimation, translation, and structured evaluation, can be derived by generalizing two of these parameters -- the grammar and the logic. The article culminates with a recipe for using such generalized parsers to train, apply, and evaluate an SMT system that is driven by tree-structured translation models.

algorithm, artificial intelligence, natural language, (16 more...)

cs/0407005

Country:

Europe (1.00)
North America > United States > Maryland (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

NLOMJ--Natural Language Object Model in Java

Jia, Jiyou

We have developed a web-based human-computer-intera ction system with natural language for foreign language learning: CSI EC (Computer Simulator in Educational Communication) [1]. The kernel of this system is the natural language understanding mechanism (NLML, NLOMJ and NLDB) and the communicational response (CR). NLML(Natural Language Markup Languag e) is a markup language to describe the grammar of an expression in a natur al language. It is produced to an expression of this natural language by a parser wri tten according to the grammar rules and lexicon of this language [2]. We use English as the experiment language in our system. For example, the NLML for the sentence " I come " is

artificial intelligence, natural language, verb phrase, (16 more...)

cs/0404041

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.51)