We discuss the problems associated with versioning ontologies in distributed environments. This is an important issue because ontologies can be of great use in structuring and querying intemet information, but many of the Intemet's characteristics, such as distributed ownership, rapid evolution, and heterogeneity, make ontology management difficult. We present SHOE, a web-based knowledge representation language that supports multiple versions of ontologies. We then discuss the features of SHOE that address ontology versioning, the affects of ontology revision on SHOE web pages, and methods for implementing ontology integration using SHOE's extension and version mechanisms. 1. Introduction As the use of ontologies becomes more prevalent, there is a more pressing need for good ontology management schemes. This is especially true once an ontology has been used to structure data, since changing it can be very expensive. Often the solution is to "get it right the first time", however, in long term applications, there is always the chance that new information will be discovered or that different features of the domain will become important. Therefore, we must think of ontology development as an ongoing process. In a centralized environment, it may be possible to coordinate ontology revisions with corresponding revisions to the data that was structured using the ontology. However, as the volume of data increases this become more difficult.
Building ontologies is a difficult task requiring skills in logics and ontological analysis. Domain experts usually reach as far as organizing a set of concepts into a hierarchy in which the semantics of the relations is under-specified. The categorization of Wikipedia is a huge concept hierarchy of this form, covering a broad range of areas. We propose an automatic method for bootstrapping domain ontologies from the categories of Wikipedia. The method first selects a subset of concepts that are relevant for a given domain. The relevant concepts are subsequently split into classes and individuals, and, finally, the relations between the concepts are classified into subclass_of, instance_of, part_of, and generic related_to. We evaluate our method by generating ontology skeletons for the domains of Computing and Music. The quality of the generated ontologies has been measured against manually built ground truth datasets of several hundred nodes.
Chenthamarakshan, Vijil (IBM T J Watson Research Center Yorktown Heights) | Melville, Prem (IBM T J Watson Research Center Yorktown Heights) | Sindhwani, Vikas (IBM T J Watson Research Center Yorktown Heights) | Lawrence, Richard D (IBM T J Watson Research Center Yorktown Heights)
The rapid construction of supervised text classification models is becoming a pervasive need across many modern applications. To reduce human-labeling bottlenecks, many new statistical paradigms (e.g., active, semi-supervised, transfer and multi-task learning) have been vigorously pursued in recent literature with varying degrees of empirical success. Concurrently, the emergence of Web 2.0 platforms in the last decade has enabled a world-wide, collaborative human effort to construct a massive ontology of concepts with very rich, detailed and accurate descriptions. In this paper we propose a new framework to extract supervisory information from such ontologies and complement it with a shift in human effort from direct labeling of examples in the domain of interest to the much more efficient identification of concept-class associations. Through empirical studies on text categorization problems using the Wikipedia ontology, we show that this shift allows very high-quality models to be immediately induced at virtually no cost.
Editor's Note: An update to this article has been posted here on 7/14/04. As the hype of past decades fades, the current heir to the artificial intelligence legacy may well be ontologies. Evolving from semantic network notions, modern ontologies are proving quite useful. And they are doing so without relying on the jumble of rule-based techniques common in earlier knowledge representation efforts. These structured depictions or models of known (and accepted) facts are being built today to make a number of applications more capable of handling complex and disparate information.
We propose an ontological theory that is powerful enough to describe both complex spatiotemporal processes (occurrents) and the enduring entities (continuants) that participate therein. The theory is divided into two major categories of sub-theories: (sub-) theories of type SPAN and (sub-)theories of type SNAP. These theories represent two complementary perspectives on reality and result in distinct though compatible systems of categories. In SNAP we have enduring entities such as substances, qualities, roles, functions; in SPAN we have perduring entities such as processes and their parts and aggregates. We argue that both kinds of ontological theory are required in order to give a non-reductionism account of complex domains of reality.