Goto

Collaborating Authors

 Information Retrieval


Determining the Unithood of Word Sequences using a Probabilistic Approach

arXiv.org Artificial Intelligence

Most research related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, novelties are rare in this small sub-field of term extraction. In addition, existing work were mostly empirically motivated and derived. We propose a new probabilistically-derived measure, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our comparative study using 1,825 test cases against an existing empirically-derived function revealed an improvement in terms of precision, recall and accuracy.


On the Use of Automatically Acquired Examples for All-Nouns Word Sense Disambiguation

Journal of Artificial Intelligence Research

This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing task that is thought to be important for many Language Technology applications, such as Information Retrieval, Information Extraction, or Machine Translation. One of the main issues preventing the deployment of WSD technology is the lack of training examples for Machine Learning systems, also known as the Knowledge Acquisition Bottleneck. A method which has been shown to work for small samples of words is the automatic acquisition of examples. We have previously shown that one of the most promising example acquisition methods scales up and produces a freely available database of 150 million examples from Web snippets for all polysemous nouns in WordNet. This paper focuses on the issues that arise when using those examples, all alone or in addition to manually tagged examples, to train a supervised WSD system for all nouns. The extensive evaluation on both lexical-sample and all-words Senseval benchmarks shows that we are able to improve over commonly used baselines and to achieve top-rank performance. The good use of the prior distributions from the senses proved to be a crucial factor.


Intelligent Peer Networks for Collaborative Web Search

AI Magazine

Collaborative query routing is a new paradigm for Web search that treats both established search engines and other publicly available indices as intelligent peer agents in a search network. The approach makes it transparent for anyone to build their own (micro) search engine, by integrating established Web search services, desktop search, and topical crawling techniques. We present the 6S peer network, which uses machine learning techniques to learn about the changing query environment. We show that simple reinforcement learning algorithms are sufficient to detect and exploit semantic locality in the network, resulting in efficient routing and high-quality search results.


The Information Ecology of Social Media and Online Communities

AI Magazine

Citizens, both young and feeds, and semistructured metadata old, are also discovering how social media in the form of extensible markup language technology can improve their lives and (XML) and resource description give them more voice in the world. We they provide more useful, trustworthy, begin by describing an overarching task of and reliable. Pursuing this task uncovers It differs, however, in ways a number of problems that must be addressed, that affect how it should be modeled, analyzed, three of which we describe in and exploited. The first is recognizing spam model for the general web is as a directed graph of web pages with undifferentiated in the form of spam blogs (splogs) and links between pages. The second is developing has a much richer network structure more effective techniques to recognize in that there are more types of nodes the social structure of blog communities. For example, the abstract model for the underlying blog people who contribute to blogs and au-network structure and how it evolves. Figure 2 shows a hypothetical blog graph and its corresponding flow of information in the influence graph. Studies on influence in social networks and collaboration graphs have typically focused on the task of identifying key individuals who play an important role in propagating information. This is similar to finding authoritative pages on the web.


Intelligent Peer Networks for Collaborative Web Search

AI Magazine

Collaborative query routing is a new paradigm for Web search that treats both established search engines and other publicly available indices as intelligent peer agents in a search network. The approach makes it transparent for anyone to build their own (micro) search engine, by integrating established Web search services, desktop search, and topical crawling techniques. The challenge in this model is that each of these agents must learn about its environment— the existence, knowledge, diversity, reliability, and trustworthiness of other agents — by analyzing the queries received from and results exchanged with these other agents. We present the 6S peer network, which uses machine learning techniques to learn about the changing query environment. We show that simple reinforcement learning algorithms are sufficient to detect and exploit semantic locality in the network, resulting in efficient routing and high-quality search results. A prototype of 6S is available for public use and is intended to assist in the evaluation of different AI techniques employed by the networked agents.


Conditioning Probabilistic Databases

arXiv.org Artificial Intelligence

Past research on probabilistic databases has studied the problem of answering queries on a static database. Application scenarios of probabilistic databases however often involve the conditioning of a database using additional information in the form of new evidence. The conditioning problem is thus to transform a probabilistic database of priors into a posterior probabilistic database which is materialized for subsequent query processing or further refinement. It turns out that the conditioning problem is closely related to the problem of computing exact tuple confidence values. It is known that exact confidence computation is an NP-hard problem. This has led researchers to consider approximation techniques for confidence computation. However, neither conditioning nor exact confidence computation can be solved using such techniques. In this paper we present efficient techniques for both problems. We study several problem decomposition methods and heuristics that are based on the most successful search techniques from constraint satisfaction, such as the Davis-Putnam algorithm. We complement this with a thorough experimental evaluation of the algorithms proposed. Our experiments show that our exact algorithms scale well to realistic database sizes and can in some scenarios compete with the most efficient previous approximation algorithms.


Cooperative Search with Concurrent Interactions

Journal of Artificial Intelligence Research

In this paper we show how taking advantage of autonomous agents' capability to maintain parallel interactions with others, and incorporating it into the cooperative economic search model results in a new search strategy which outperforms current strategies in use. As a framework for our analysis we use the electronic marketplace, where buyer agents have the incentive to search cooperatively. The new search technique is quite intuitive, however its analysis and the process of extracting the optimal search strategy are associated with several significant complexities. These difficulties are derived mainly from the unbounded search space and simultaneous dual affects of decisions taken along the search. We provide a comprehensive analysis of the model, highlighting, demonstrating and proving important characteristics of the optimal search strategy. Consequently, we manage to come up with an efficient modular algorithm for extracting the optimal cooperative search strategy for any given environment. A computational based comparative illustration of the system performance using the new search technique versus the traditional methods is given, emphasizing the main differences in the optimal strategy's structure and the advantage of using the proposed model.


Query-time Entity Resolution

Journal of Artificial Intelligence Research

Entity resolution is the problem of reconciling database references corresponding to the same real-world entities. Given the abundance of publicly available databases that have unresolved entities, we motivate the problem of query-time entity resolution quick and accurate resolution for answering queries over such `unclean' databases at query-time. Since collective entity resolution approaches --- where related references are resolved jointly --- have been shown to be more accurate than independent attribute-based resolution for off-line entity resolution, we focus on developing new algorithms for collective resolution for answering entity resolution queries at query-time. For this purpose, we first formally show that, for collective resolution, precision and recall for individual entities follow a geometric progression as neighbors at increasing distances are considered. Unfolding this progression leads naturally to a two stage `expand and resolve' query processing strategy. In this strategy, we first extract the related records for a query using two novel expansion operators, and then resolve the extracted records collectively. We then show how the same strategy can be adapted for query-time entity resolution by identifying and resolving only those database references that are the most helpful for processing the query. We validate our approach on two large real-world publication databases where we show the usefulness of collective resolution and at the same time demonstrate the need for adaptive strategies for query processing. We then show how the same queries can be answered in real-time using our adaptive approach while preserving the gains of collective resolution. In addition to experiments on real datasets, we use synthetically generated data to empirically demonstrate the validity of the performance trends predicted by our analysis of collective entity resolution over a wide range of structural characteristics in the data.


Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email

Journal of Artificial Intelligence Research

Previous work in social network analysis (SNA) has modeled the existence of links from one entity to another, but not the attributes such as language content or topics on those links. We present the Author-Recipient-Topic (ART) model for social network analysis, which learns topic distributions based on the direction-sensitive messages sent between entities. The model builds on Latent Dirichlet Allocation (LDA) and the Author-Topic (AT) model, adding the key attribute that distribution over topics is conditioned distinctly on both the sender and recipient---steering the discovery of topics according to the relationships between people. We give results on both the Enron email corpus and a researcher's email archive, providing evidence not only that clearly relevant topics are discovered, but that the ART model better predicts people's roles and gives lower perplexity on previously unseen messages. We also present the Role-Author-Recipient-Topic (RART) model, an extension to ART that explicitly represents people's roles.


Practical Approach to Knowledge-based Question Answering with Natural Language Understanding and Advanced Reasoning

arXiv.org Artificial Intelligence

This research hypothesized that a practical approach in the form of a solution framework known as Natural Language Understanding and Reasoning for Intelligence (NaLURI), which combines full-discourse natural language understanding, powerful representation formalism capable of exploiting ontological information and reasoning approach with advanced features, will solve the following problems without compromising practicality factors: 1) restriction on the nature of question and response, and 2) limitation to scale across domains and to real-life natural language text.