Finding definitions in huge text collections is a challenging problem, not only because of the many ways in which definitions can be conveyed in natural language texts but also because the definiendum (i.e., the thing to be defined) has not, on its own, enough discriminative power to allow selection of definition-bearing passages from the collection. We have developed a method that uses already available external sources to gather knowledge about the "definiendum" before trying to define it using the given text collection. This knowledge consists of lists of relevant secondary terms that frequently cooccur with the definiendum in definition-bearing passages or "definiens". External sources used to gather secondary terms are an online enyclopedia, a lexical database and the Web. These secondary terms together with the definiendum are used to select passages from the text collection performing information retrieval. Further linguistic analysis is carried out on each passage to extract definition strings from the passages using a number of criteria including the presence of main and secondary terms or definition patterns.
We introduce an interlingua-based approach to crosslanguage information retrieval, in which queries, as well as documents, are mapped onto a language-independent concept layer and retrieval operations are performed at the level of that interlingua. This approach is contrasted with one which operates without such an intermediary concept level. Non-English queries (German ones, in our experiments) are directly translated to English queries which, subsequently, are processed on English documents. We provide an empirical evaluation of both alternatives on a large medical document collection.
This paper describes a vector space equalization scheme for a concept-based collaborative information retrieval system; evaluation results are given. The authors previously proposed a peer-to-peer information exchange system that aims at smooth knowledge and information management to activate organizations and communities. One problem with the system arises when information is retrieved from another's personal repository since the framework's retrieval criteria are strongly personalized. The system is assumed to employ a vector space model and a concept-base as its information retrieval mechanism. The vector space of one system is very different from that of another system, so retrieval results would not reflect the requester's intention. This paper presents a vector space equalization scheme, the automated relevance feedback scheme, that compensates the differences in the vector spaces of the personal repositories. A system that implements the scheme is realized and evaluated using documents on the Internet. This paper presents implementation details, the evaluation procedure, and evaluation results.
We present a new approach in web search engines. The web creates new challenges for information retrieval. The vast improvement in information access is not the only advantage resulting from the keyword search. Additionally, much potential exists for analyzing interests and relationships within the structure of the web. The creation of a hyperlink by the author of a web page explicitly represents a relationship between the source and destination pages which demonstrates the hyperlink structure between web pages.
Document retrieval in languages with a rich and complex morphology - particularly in terms of derivation and (single-word) composition - suffers from serious performance degradation with the stemming-only query-term-totext-word matching paradigm. We propose an alternative approach in which morphologically complex word forms are segmented into relevant subwords (such as stems, prefixes, suffixes), and subwords constitute the basic unit for indexing and retrieval. We evaluate our approach on a biomedical document collection.
Recent research has shown that a case-based perspective on collaborative filtering for recommendation can provide significant benefits in decision support accuracy over traditional collaborative techniques, particularly as dataset sparsity increases. These benefits derive both from the use of more sophisticated case-based similarity metrics and from the proactive maintenance of item similarity knowledge using data mining. This paper presents a natural next step in the work by validating these findings in the context of more complex models of collaborative filtering, as well as by demonstrating that such techniques also preserve recommendation diversity.
We have developed a model-based, distributed architecture that integrates diverse components in a system designed for lunar and planetary surface operations: an astronaut's space suit, cameras, all-terrain vehicles, robotic assistant, crew in a local habitat, and mission support team. Software processes ("agents") implemented in the Brahms language, run on multiple, mobile platforms. These "mobile agents" interpret and transform available data to help people and robotic systems coordinate their actions to make operations more safe and efficient. The Brahms-based mobile agent architecture (MAA) uses a novel combination of agent types so the software agents may understand and facilitate communications between people and between system components. A state-of-the-art spoken dialogue interface is integrated with Brahms models, supporting a speech-driven field observation record and rover command system. An important aspect of the methodology involves first simulating the entire system in Brahms, then configuring the agents into a runtime system Thus, Brahms provides a language, engine, and system builder's toolkit for specifying and implementing multiagent systems.
The World Wide Web is developing rapidly, but neither recall nor precision of traditional search engines can satisfy the increasing demands of users. Presently, RDF is widely accepted as a standard for semantic representation of information on the Web, which makes possible the advanced search among web resources. In this paper, we introduce an approach for semantic search by matching RDF graphs. New similarity between RDF graphs is defined and ontologies on arcs as well as on nodes are employed. The implementation of a demonstration system on our method is currently in progress.
The importance of Web services has been recognized and widely accepted by industry and academic research. However, the two worlds have proposed solutions that progress along different dimensions. Academic research has been, mostly concerned with expressiveness of service descriptions, while industry has focused on modularization of service layers -- mostly for usability in the short term. This paper is concerned with merging these two streams of progress. Our point of departure is the current proposal by IBM. Its proposal is extended by Semantic Web technologies such that asmooth evolution from Web services in the current Web to Web services in the Semantic Web appears possible and -- infact -- highly desirable. As a showcase we describe SWOBIS,anontology-compatible registry for software tools, that represents a first step towards developing a search engine for Web services based on Semantic Web technologies.