Collaborating Authors

Hien Nguyen

AAAI Conferences

In this paper, we report our development of a hybrid user model for improving a user's effectiveness in a search. Specifically, we dynamically capture a user's intent and combine the captured user intent with the elements of an information retrieval system in a decision theoretic framework. Our solution is to identify a set of key attributes describing a user's intent, and determine the interactions among them. Then we build our user model by capturing these attributes, which we call the IPC model. We further extend this model to combine the captured user intent with the elements of an information retrieval system in a decision theoretic framework, thus creating a hybrid user model.

Translingual Information Access

AAAI Conferences

With such judgements, we can construct a better term-weighted query for the TL search, essentially producing true translingual RF. Of course, this RF process can also be used to enhance the SL query and search other SL databases at no extra cost to or involvement from the analyst. The envisioned mechanism is shown in Figure 3, and encompasses the following steps: 1. The analyst types in a source language query Qs; 2. Parallel corpus (source half) is searched by an engine using Qs; 3. One of the following methods is used to search the TL document database: Prom retrieved SL/TL document pairs, the TL document contents are used as a new query QT to search the TL document database; or The retrieved SL/TL document pairs are first given back to the analyst, in order to scan the SL documents for relevance; then the Rocchio formula is used for both SL and TL document database search.

Knowledge Representation, Learning, and Reasoning in WebDoc - A Web Document Classification System

AAAI Conferences

This paper describe a novel approach to knowledge representation, learning, and reasoning in WebDoc, a system that classifies Web documents according to the Library of Congress classification system. We argue that an automatically constructed domain-independent knowledge base is indispensable. The WebDoc system builds a knowledge base (represented as a semantic network) that contains the Library of Congress subject headings and their relationships. Through training on human-indexed and NLPparsed Web documents, WebDoc modifies the semantic network and generates rules for future index generation tasks.

Domain Specific Knowledge-based Information Retrieval Model using Knowledge Reduction

AAAI Conferences

Information is a meaningful collection of data. Information retrieval (IR) is an important tool for changing data to information. Of the three classical IR models (Boolean, Support Vector Machine, and Probabilistic), the Support Vector Machine (SVM) IR model is most widely used. But this model does not convey enough relevancies between a query and documents to produce effective results reflecting knowledge. To augment the IR process with knowledge, several techniques are proposed including query expansion by using a thesaurus, a term relationship measurement like Latent Semantic Indexing (LSI), and a probabilistic inference engine using Bayesian Networks. Our research aims to create an information retrieval model that incorporates domain specific knowledge to provide knowledgeable answers to users. We use a knowledgebased model to represent domain specific knowledge. Unlike other knowledge-based IR models, our model converts domain-specific knowledge to a relationship of terms represented as quantitative values, which gives improved efficiency.

Natural language processing for word sense disambiguation and information extraction Artificial Intelligence

This research work deals with Natural Language Processing (NLP) and extraction of essential information in an explicit form. The most common among the information management strategies is Document Retrieval (DR) and Information Filtering. DR systems may work as combine harvesters, which bring back useful material from the vast fields of raw material. With large amount of potentially useful information in hand, an Information Extraction (IE) system can then transform the raw material by refining and reducing it to a germ of original text. A Document Retrieval system collects the relevant documents carrying the required information, from the repository of texts. An IE system then transforms them into information that is more readily digested and analyzed. It isolates relevant text fragments, extracts relevant information from the fragments, and then arranges together the targeted information in a coherent framework. The thesis presents a new approach for Word Sense Disambiguation using thesaurus. The illustrative examples supports the effectiveness of this approach for speedy and effective disambiguation. A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated. A question-answering system describes the operation of information extraction from the retrieved text documents. The process of information extraction for answering a query is considerably simplified by using a Structured Description Language (SDL) which is based on cardinals of queries in the form of who, what, when, where and why. The thesis concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning, for document retrieval and information extraction. This strategy permits relaxation of many limitations, which are inherent in Bayesian probabilistic approach.