Goto

Collaborating Authors

 Technology


Ontology Translation for Interoperability Among Semantic Web Services

AI Magazine

Research on semantic web services promises greater interoperability among software agents and web services by enabling content-based automated service discovery and interaction and by utilizing . Although this is to be based on use of shared ontologies published on the semantic web, services produced and described by different developers may well use different, perhaps partly overlapping, sets of ontologies. Interoperability will depend on ontology mappings and architectures supporting the associated translation processes. The question we ask is, does the traditional approach of introducing mediator agents to translate messages between requestors and services work in such an open environment? This article reviews some of the processing assumptions that were made in the development of the semantic web service modeling ontology OWL-S and argues that, as a practical matter, the translation function cannot always be isolated in mediators. Ontology mappings need to be published on the semantic web just as ontologies themselves are. The translation for service discovery, service process model interpretation, task negotiation, service invocation, and response interpretation may then be distributed to various places in the architecture so that translation can be done in the specific goal-oriented informational contexts of the agents performing these processes. We present arguments for assigning translation responsibility to particular agents in the cases of service invocation, response translation, and matchmaking.


The First Conference on E-mail and Anti-Spam

AI Magazine

The First Conference on E-mail and Anti- Spam was held from July 30 to July 31, 2004 in Mountain View, California. The conference, attended by 180 researchers, featured 29 papers that covered a number of topics, including e-mail in general, nonstatistical techniques for stopping spam, machine learning techniques, issues of identity in e-mail, as well as law and policy. The 2005 conference will be held at Stanford University from July 21 to 22.


The Sixth International Conference on Enterprise Information Systems (ICEIS 2004

AI Magazine

The Sixth International Conference on Enterprise Information Systems (ICEIS) was held in Porto, Portugal; previous venues were in Spain, France, and the United Kingdom. Since its inception in 1999, ICEIS has grown steadily, and is now one of the largest international conferences in the area of information systems. In 2004, more than 600 papers were submitted to the conference and its ten satellite workshops. One of the interesting features of this conference is the high number of invited speakers. In 2004, eighteen keynote speakers were featured at ICEIS and its workshops.


Data Integration: A Logic-Based Perspective

AI Magazine

Data integration is the problem of combining data residing at different autonomous, heterogeneous sources and providing the client with a unified, reconciled global view of the data. We discuss dataintegration systems, taking the abstract viewpoint that the global view is an ontology expressed in a class-based formalism. We resort to an expressive description logic, ALCQI, that fully captures classbased representation formalisms, and we show that query answering in data integration, as well as all other relevant reasoning tasks, is decidable. However, when we have to deal with large amounts of data, the high computational complexity in the size of the data makes the use of a fullfledged expressive description logic infeasible in practice. This leads us to consider DL-Lite, a specifically tailored restriction of ALCQI that ensures tractability of query answering in data integration while keeping enough expressive power to capture the most relevant features of class-based formalisms.


Automatic Ontology Matching Using Application Semantics

AI Magazine

We propose the use of application semantics to enhance the process of semantic reconciliation. Application semantics involves those elements of business reasoning that affect the way concepts are presented to users: their layout, and so on. In particular, we pursue in this article the notion of precedence, in which temporal constraints determine the order in which concepts are presented to the user. Existing matching algorithms use either syntactic means (such as term matching and domain matching) or model semantic means, the use of structural information that is provided by the specific data model to enhance the matching process. The novelty of our approach lies in proposing a class of matching techniques that takes advantage of ontological structures and application semantics. As an example, the use of precedence to reflect business rules has not been applied elsewhere, to the best of our knowledge. We have tested the process for a variety of web sites in domains such as car rentals and airline reservations, and we share our experiences with precedence and its limitations.


Semantic Integration in Text: From Ambiguous Names to Identifiable Entities

AI Magazine

Semantic integration focuses on discovering, representing, and manipulating correspondences between entities in disparate data sources. The topic has been widely studied in the context of structured data, with problems being considered including ontology and schema matching, matching relational tuples, and reconciling inconsistent data values. In recent years, however, semantic integration over text has also received increasing attention. This article studies a key challenge in semantic integration over text: identifying whether different mentions of real-world entities, such as "JFK" and "John Kennedy," within and across natural language text documents, actually represent the same concept. We present a machine-learning study of this problem. The first approach is a discriminative approach -- a pairwise local classifier is trained in a supervised way to determine whether two given mentions represent the same real-world entity. This is followed, potentially, by a global clustering algorithm that uses the classifier as its similarity metric. Our second approach is a global generative model, at the heart of which is a view on how documents are generated and how names (of different entity types) are "sprinkled" into them. In its most general form, our model assumes (1) a joint distribution over entities (for example, a document that mentions "President Kennedy" is more likely to mention "Oswald" or "White House" than "Roger Clemens"), and (2) an "author" model that assumes that at least one mention of an entity in a document is easily identifiable and then generates other mentions via (3) an "appearance" model that governs how mentions are transformed from the "representative" mention. We show that both approaches perform very accurately, in the range of 90-95 percent. F1 measure for different entity types, much better than previous approaches to some aspects of this problem. Finally, we discuss how our solution for mention matching in text can be potentially applied to matching relational tuples, as well as to linking entities across databases and text.


Semantic Integration through Invariants

AI Magazine

A semantics-preserving exchange of information between two software applications requires mappings between logically equivalent concepts in the ontology of each application. The challenge of semantic integration is therefore equivalent to the problem of generating such mappings, determining that they are correct, and providing a vehicle for executing the mappings, thus translating terms from one ontology into another. This article presents an approach toward this goal using techniques that exploit the model-theoretic structures underlying ontologies. With these as inputs, semiautomated and automated components may be used to create mappings between ontologies and perform translations.


Automatically Utilizing Secondary Sources to Align Information Across Sources

AI Magazine

XML, web services, and the semantic web have opened the door for new and exciting informationintegration applications. Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must identify common entities from these sources. Data from many data sources on the web does not contain enough information to link the records accurately using state-of-the-art record-linkage systems. However, it is possible to exploit secondary data sources on the web to improve the recordlinkage process. We present an approach to accurately and automatically match entities from various data sources by utilizing a state-of-the-art record-linkage system in conjunction with a data-integration system. The data-integration system is able to automatically determine which secondary sources need to be queried when linking records from various data sources. In turn, the record-linkage system is then able to utilize this additional information to improve the accuracy of the linkage between datasets.


Combining Knowledge- and Corpus-based Word-Sense-Disambiguation Methods

Journal of Artificial Intelligence Research

In this paper we concentrate on the resolution of the lexical ambiguity that arises when a given word has several different meanings. This specific task is commonly referred to as word sense disambiguation (WSD). The task of WSD consists of assigning the correct sense to words using an electronic dictionary as the source of word definitions. We present two WSD methods based on two main methodological approaches in this research area: a knowledge-based method and a corpus-based method. Our hypothesis is that word-sense disambiguation requires several knowledge sources in order to solve the semantic ambiguity of the words. These sources can be of different kinds--- for example, syntagmatic, paradigmatic or statistical information. Our approach combines various sources of knowledge, through combinations of the two WSD methods mentioned above. Mainly, the paper concentrates on how to combine these methods and sources of information in order to achieve good results in the disambiguation. Finally, this paper presents a comprehensive study and experimental work on evaluation of the methods and their combinations.


Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

Journal of Artificial Intelligence Research

There has been increased interest in devising learning techniques that combine unlabeled data with labeled data - i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semi-supervised learning techniques on a variety of datasets. We attempt to answer various questions such as the effect of independence or relevance amongst features, the effect of the size of the labeled and unlabeled sets and the effect of noise. We also investigate the impact of sample-selection bias on the semi -supervised learning techniques under study and implement a bivariate probit technique particularly designed to correct for such bias.