The availability of massive amounts of raw domain data has created an urgent need for sophisticated AI systems with capabilities to find complex and useful information in big-data repositories in real-time. Such systems should have capabilities to process and extract significant information from natural language documents, search and answer complex questions, make sophisticated predictions about future events, and generally interact with users in much more powerful and intuitive ways. To be effective, these systems need a significant amount of domain-specific knowledge in addition to the general-domain knowledge. Ontologies/Knowledge-Bases represent knowledge about domains of interest and serve as the backbone for semantic technologies and applications. However, creating such domain models is time consuming, error prone, and the end product is difficult to maintain. In this paper, we present a novel methodology to automatically build semantically rich knowledge models for specific domains using domain-relevant unstructured data from resources such as web articles, manuals, e-books, blogs, etc. We also present evaluation results for our automatic ontology/knowledge-base generation methodology using freely-available textual resources from the World Wide Web.
The demand for ontologies is rapidly growing especially due to developments in knowledge management, E-commerce and the Semantic Web. Building an ontology and a background knowledge base manually is so costly and timeconsuming that it hampers progress in intelligent information access. Therefore, semiautomatic or automatic construction of ontologies is very useful. This paper presents a concept-focused ontology building method based on text mining technology. The method focuses on a particular domain concept at a time and actively acquires source documents for the ontological knowledge about the concepts.
Term extraction is to extract domain relevant terms from a domain specific, unstructured corpus, which in an organisational setting can be used for categorisation and information retrieval. Previous statistical approaches to automatic term extraction rely on term frequencies, which may not only hamper the accuracy but also lower the rank of or even discard domain relevant yet infrequent terms. This paper aims at minimising the impact of term frequency and thus improving precision of top-k terms, by using a graph based ranking algorithm with the aids of latent vector representation of terms and term relations embedded in patents instead of general-domain knowledge sources. We show that the proposed method outperforms all the previous works significantly.
The difficulty of domain knowledge acquisition is one of the most sensible challenges of intelligent tutoring systems. Relying on domain experts and building domain models from scratch are not viable solutions. The ability to automatically extract domain knowledge from documents can contribute to overcome these difficulties. In this paper, we use machine learning and natural language processing to parse documents and to generate domain concept maps and ontologies. We also show how an intelligent tutoring system benefits from the generated structures.
We study the performance of two representations of word meaning in learning noun-modifier semantic relations. One representation is based on lexical resources, in particular WordNet, the other - on a corpus. We experimented with decision trees, instance-based learning and Support Vector Machines. All these methods work well in this learning task. We report high precision, recall and F-score, and small variation in performance across several 10-fold cross-validation runs. The corpus-based method has the advantage of working with data without word-sense annotations and performs well over the baseline. The WordNet-based method, requiring wordsense annotated data, has higher precision.