AITopics

In a highly multilingual and multicultural environment such as in the European Commission with soon over twenty official languages, there is an urgent need for text analysis tools that use minimal linguistic knowledge so that they can be adapted to many languages without much human effort. We are presenting two such Information Extraction tools that have already been adapted to various Western and Eastern European languages: one for the recognition of date expressions in text, and one for the detection of geographical place names and the visualisation of the results in geographical maps. An evaluation of the performance has produced very satisfying results.

artificial intelligence, natural language, place name, (12 more...)

cs/0609063

Country:

Europe > Portugal (0.28)
Europe > France (0.28)
Europe > Austria (0.28)
North America > United States > New Mexico (0.14)

Genre: Research Report (0.40)

Industry: Government > Regional Government > Europe Government (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Steinberger, Ralf, Pouliquen, Bruno, Ignat, Camelia

Navigating multilingual news collections using automatically extracted information

We are presenting a text analysis tool set that allows analysts in various fields to sieve through large collections of multilingual news items quickly and to find information that is of relevance to them. For a given document collection, the tool set automatically clusters the texts into groups of similar articles, extracts names of places, people and organisations, lists the user-defined specialist terms found, links clusters and entities, and generates hyperlinks. Through its daily news analysis operating on thousands of articles per day, the tool also learns relationships between people and other entities. The fully functional prototype system allows users to explore and navigate multilingual document collections across languages and time.

artificial intelligence, machine learning, natural language, (19 more...)

cs/0609053

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report (0.40)

Industry:

Government (1.00)
Leisure & Entertainment > Sports > Motorsports > Formula One (0.94)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Multilingual person name recognition and transliteration

Pouliquen, Bruno, Steinberger, Ralf, Ignat, Camelia, Temnikova, Irina, Widiger, Anna, Zaghouani, Wajdi, Zizka, Jan

We present an exploratory tool that extracts person names from multilingual news collections, matches name variants referring to the same person, and infers relationships between people based on the co-occurrence of their names in related news. A novel feature is the matching of name variants across languages and writing systems, including names written with the Greek, Cyrillic and Arabic writing system. Due to our highly multilingual setting, we use an internal standard representation for name representation and matching, instead of adopting the traditional bilingual approach to transliteration. This work is part of the news analysis system NewsExplorer that clusters an average of 25,000 news articles per day to detect related news within the same and across different languages.

machine learning, natural language, variant, (19 more...)

cs/0609051

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Sports (1.00)
Government > Regional Government > Europe Government (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.30)

Aubin, Sophie, Hamon, Thierry

Improving Term Extraction with Terminological Resources

Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. The difficulty or impossibility of customising them to new domains is an additional limitation. In this paper, we propose to use external terminologies to influence generic linguistic data in order to augment the quality of the extraction. The tool we implemented exploits testified terms at different steps of the process: chunking, parsing and extraction of term candidates. Experiments reported here show that, using this method, more term candidates can be acquired with a higher level of reliability. We further describe the extraction process involving endogenous disambiguation implemented in the term extractor YaTeA.

artificial intelligence, natural language, term candidate, (17 more...)

cs/0609019

Country: Europe > France (0.28)

Genre:

Research Report (0.70)
Workflow (0.48)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.57)

Malyshkin, Vladislav, Bakhramov, Ray, Gorodetsky, Andrey

A Massive Local Rules Search Approach to the Classification Problem

An approach to the classification problem of machine learning, based on building local classification rules, is developed. The local rules are considered as projections of the global classification rules to the event we want to classify. A massive global optimization algorithm is used for optimization of quality criterion. The algorithm, which has polynomial complexity in typical case, is used to find all high--quality local rules. The other distinctive feature of the algorithm is the integration of attributes levels selection (for ordered attributes) with rules searching and original conflicting rules resolution strategy. The algorithm is practical; it was tested on a number of data sets from UCI repository, and a comparison with the other predicting techniques is presented.

algorithm, artificial intelligence, machine learning, (17 more...)

cs/0609007

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Slepoy, A., Thompson, A. P., Peters, M. D.

Searching for Globally Optimal Functional Forms for Inter-Atomic Potentials Using Parallel Tempering and Genetic Programming

We develop a Genetic Programming-based methodology that enables discovery of novel functional forms for classical inter-atomic force-fields, used in molecular dynamics simulations. Unlike previous efforts in the field, that fit only the parameters to the fixed functional forms, we instead use a novel algorithm to search the space of many possible functional forms. While a follow-on practical procedure will use experimental and {\it ab inito} data to find an optimal functional form for a forcefield, we first validate the approach using a manufactured solution. This validation has the advantage of a well-defined metric of success. We manufactured a training set of atomic coordinate data with an associated set of global energies using the well-known Lennard-Jones inter-atomic potential. We performed an automatic functional form fitting procedure starting with a population of random functions, using a genetic programming functional formulation, and a parallel tempering Metropolis-based optimization algorithm. Our massively-parallel method independently discovered the Lennard-Jones function after searching for several hours on 100 processors and covering a miniscule portion of the configuration space. We find that the method is suitable for unsupervised discovery of functional forms for inter-atomic potentials/force-fields. We also find that our parallel tempering Metropolis-based approach significantly improves the optimization convergence time, and takes good advantage of the parallel cluster architecture.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

cs/0608078

Genre: Research Report (0.64)

Industry: Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Expressing Implicit Semantic Relations without Supervision

Turney, Peter D.

We present an unsupervised learning algorithm that mines large text corpora for patterns that express implicit semantic relations. For a given input word pair X:Y with some unspecified semantic relations, the corresponding output list of patterns is ranked according to how well each pattern Pi expresses the relations between X and Y. For example, given X=ostrich and Y=bird, the two highest ranking output patterns are "X is the largest Y" and "Y such as the X". The output patterns are intended to be useful for finding further pairs with the same relations, to support the construction of lexicons, ontologies, and semantic networks. The patterns are sorted by pertinence, where the pertinence of a pattern Pi for a word pair X:Y is the expected relational similarity between the given pair and typical pairs for Pi. The algorithm is empirically evaluated on two tasks, solving multiple-choice SAT word analogy questions and classifying semantic relations in noun-modifier pairs. On both tasks, the algorithm achieves state-of-the-art results, performing significantly better than several alternative pattern ranking algorithms, based on tf-idf.

artificial intelligence, natural language, pertinence, (18 more...)

cs/0607120

Country: North America > Canada (0.46)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (0.50)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Pyysalo, Sampo, Salakoski, Tapio, Aubin, Sophie, Nazarenko, Adeline

Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches

We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error. The adapted parser is available under an open-source license at http://www.it.utu.fi/biolg .

artificial intelligence, extension, natural language, (19 more...)

cs/0606119

Country:

Europe (0.93)
North America > United States (0.46)

Genre: Research Report > Experimental Study (0.47)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.96)

Predictions as statements and decisions

Vovk, Vladimir

This paper is based on my invited talk at the 19th Annual Conference on Learning Theory (Pittsburgh, PA, June 24, 2006). In recent years COL T invited talks have tended to aim at establishing connections between the traditio nal concerns of the learning community and the work done by other communities (s uch as game theory, statistics, information theory, and optimization). F ollowing this tradition, I will argue that some ideas from the foundations of prob ability can be fruitfully applied in competitive on-line learning. In this paper I will use the following informal taxonomy of predictions (reminiscent of Shafer's [36], Figure 2, taxonomy of probabilities): D-predictions are mere Decisions. They can never be true or false but can be good or bad.

artificial intelligence, machine learning, prediction, (15 more...)

cs/0606093

Country:

Europe > United Kingdom > England (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.24)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Education > Educational Setting > Online (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.34)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.34)

Kryssanov, V. V., Tamaki, H., Kitamura, S.

Evolutionary Design: Philosophy, Theory, and Application Tactics

Although it has contributed to remarkable improvements in some specific areas, attempts to develop a universal design theory are generally characterized by failure. This paper sketches arguments for a new approach to engineering design based on Semiotics - the science about signs. The approach is to combine different design theories over all the product life cycle stages into one coherent and traceable framework. Besides, it is to bring together the designer's and user's understandings of the notion of 'good product'. Building on the insight from natural sciences that complex systems always exhibit a self-organizing meaning-influential hierarchical dynamics, objective laws controlling product development are found through an examination of design as a semiosis process. These laws are then applied to support evolutionary design of products. An experiment validating some of the theoretical findings is outlined, and concluding remarks are given.

artificial intelligence, human computer interaction, semiosis, (17 more...)

cs/0606039

Country: Asia > Japan (0.28)

Genre: Research Report (0.40)

Industry: Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Human Computer Interaction (0.93)