Goto

Collaborating Authors

 Information Retrieval


From ``Identical'' to ``Similar'': Fusing Retrieved Lists Based on Inter-Document Similarities

Journal of Artificial Intelligence Research

Methods for fusing document lists that were retrieved in response to a query often utilize the retrieval scores and/or ranks of documents in the lists. We present a novel fusion approach that is based on using, in addition, information induced from inter-document similarities. Specifically, our methods let similar documents from different lists provide relevance-status support to each other. We use a graph-based method to model relevance-status propagation between documents. The propagation is governed by inter-document-similarities and by retrieval scores of documents in the lists. Empirical evaluation demonstrates the effectiveness of our methods in fusing TREC runs. The performance of our most effective methods transcends that of effective fusion methods that utilize only retrieval scores or ranks.


Learning to Order Things

arXiv.org Artificial Intelligence

Journal of Arti ial In telligen e Resear h 10 (1999) 243-270 Submitted 10/98; published 5/99 Learning to Order Things William W. Cohen w ohen resear h.a tt. Here w e onsider the problem of learning ho w to order instan es giv en feedba k in the form of preferen e judgmen ts, i.e., statemen ts to the e e t that one instan e should b e rank ed ahead of another. W e outline a t w o-stage approa h in whi h one rst learns b y on v en tional means a binary pr efer en e fun tion indi ating whether it is advisable to rank one instan e b efore another. Here w e onsider an online algorithm for learning preferen e fun tions that is based on F reund and S hapire's \Hedge " algorithm. In the se ond stage, new instan es are ordered so as to maximize agreemen t with the learned preferen e fun - tion. W e sho w that the problem of nding the ordering that agrees b est with a learned preferen e fun tion is NP-omplete. Nev ertheless, w e des rib e simple greedy algorithms that are guaran teed to nd a go o d appro ximation. Finally, w e sho w ho w metasear h an b e form ulated as an ordering problem, and presen t exp erimen tal results on learning a om-bination of \sear h exp erts," ea h of whi h is a domain-sp e i query expansion strategy for a w eb sear h engine. Ho w ev er, there are man y appli ations in whi h it is desirable to order rather than lassify instan es. Su h orderings ould b e onstru ted based on a learned probabilisti lassi er or regression mo del and in fa t often are. F or instan e, it is ommon pra ti e in information retriev al to rank do umen ts a ording to their probabilit y of relev an e to a query, as estimated b y a learned lassi er for the on ept \relev an t do umen t." An adv an tage of learning orderings dire tly is that preferen e judgmen ts an b e m u h easier to obtain than the lab els required for lassi ation learning. F or instan e, in the email appli ation men tioned ab o v e, one approa h migh t b e to rank messages a ording to their estimated probabilit y of mem b ership in the lass of \urgen t" messages, or b y some n umeri al estimate of urgen y obtained b y regression. Supp ose, ho w ev er, that a user is presen ted with an ordered list of email messages, and ele ts to read the third message rst. Giv en this ele tion, it is not ne essarily the ase that message three is urgen t, nor is there suร† ien t information to estimate an y n umeri al urgen y measures.


Building Integrated Opinion Delivery Environment

AAAI Conferences

We introduce a search engine and information retrieval system for providing access to opinion data. Natural language technology of generalization of syntactic parse trees is introduced as a similarity measure between subjects of textual opinions to link them on the fly. Information extraction algorithm for automatic summarization of web pages in the format of Google sponsored links is presented. We outline the usability of the implemented system, integrated opinion delivery environment (IODE).


Quantum Structure in Cognition: Fundamentals and Applications

arXiv.org Artificial Intelligence

Experiments in cognitive science and decision theory show that the ways in which people combine concepts and make decisions cannot be described by classical logic and probability theory. This has serious implications for applied disciplines such as information retrieval, artificial intelligence and robotics. Inspired by a mathematical formalism that generalizes quantum mechanics the authors have constructed a contextual framework for both concept representation and decision making, together with quantum models that are in strong alignment with experimental data. The results can be interpreted by assuming the existence in human thought of a double-layered structure, a 'classical logical thought' and a 'quantum conceptual thought', the latter being responsible of the above paradoxes and nonclassical effects. The presence of a quantum structure in cognition is relevant, for it shows that quantum mechanics provides not only a useful modeling tool for experimental data but also supplies a structural model for human and artificial thought processes. This approach has strong connections with theories formalizing meaning, such as semantic analysis, and has also a deep impact on computer science, information retrieval and artificial intelligence. More specifically, the links with information retrieval are discussed in this paper.


Towards an automated query modification assistant

arXiv.org Artificial Intelligence

Users who need several queries before finding what they need can benefit from an automatic search assistant that provides feedback on their query modification strategies. We present a method to learn from a search log which types of query modifications have and have not been effective in the past. The method analyses query modifications along two dimensions: a traditional term-based dimension and a semantic dimension, for which queries are enriches with linked data entities. Applying the method to the search logs of two search engines, we identify six opportunities for a query modification assistant to improve search: modification strategies that are commonly used, but that often do not lead to satisfactory results.


Emerging Topic Detection for Business Intelligence Via Predictive Analysis of 'Meme' Dynamics

AAAI Conferences

Detecting and characterizing emerging topics of discussion and consumer trends through analysis of Internet data is of great interest to businesses. This paper considers the problem of monitoring the Web to spot emerging memes โ€“ distinctive phrases which act as โ€œtracersโ€ for topics โ€“ as a means of early detection of new topics and trends. We present a novel methodology for predicting which memes will propagate widely, appearing in hundreds or thousands of blog posts, and which will not, thereby enabling discovery of significant topics. We begin by identifying measurables which should be predictive of meme success. Interestingly, these metrics are not those traditionally used for such prediction but instead are subtle measures of meme dynamics. These metrics form the basis for learning a classifier which predicts, for a given meme, whether or not it will propagate widely. The utility of the prediction methodology is demonstrated through analysis of a sample of 200 memes which emerged online during the second half of 2008.


Refining Recency Search Results with User Click Feedback

arXiv.org Artificial Intelligence

Traditional machine-learned ranking systems for web search are often trained to capture stationary relevance of documents to queries, which has limited ability to track non-stationary user intention in a timely manner. In recency search, for instance, the relevance of documents to a query on breaking news often changes significantly over time, requiring effective adaptation to user intention. In this paper, we focus on recency search and study a number of algorithms to improve ranking results by leveraging user click feedback. Our contributions are three-fold. First, we use real search sessions collected in a random exploration bucket for \emph{reliable} offline evaluation of these algorithms, which provides an unbiased comparison across algorithms without online bucket tests. Second, we propose a re-ranking approach to improve search results for recency queries using user clicks. Third, our empirical comparison of a dozen algorithms on real-life search data suggests importance of a few algorithmic choices in these applications, including generalization across different query-document pairs, specialization to popular queries, and real-time adaptation of user clicks.


Intelligent Semantic Web Search Engines: A Brief Survey

arXiv.org Artificial Intelligence

The World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The amount of information grows billions of databases. We need to search the information will specialize tools known generically search engine. There are many of search engines available today, retrieving meaningful information is difficult. However to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role.


Optimal Web-Scale Tiering as a Flow Problem

Neural Information Processing Systems

We present a fast online solver for large scale maximum-flow problems as they occur in portfolio optimization, inventory management, computer vision, and logistics. Our algorithm solves an integer linear program in an online fashion. It exploits total unimodularity of the constraint matrix and a Lagrangian relaxation to solve the problem as a convex online game. The algorithm generates approximate solutions of max-flow problems by performing stochastic gradient descent on a set of flows. We apply the algorithm to optimize tier arrangement of over 80 Million web pages on a layered set of caches to serve an incoming query stream optimally. We provide an empirical demonstration of the effectiveness of our method on real query-pages data.


b-Bit Minwise Hashing for Estimating Three-Way Similarities

Neural Information Processing Systems

Computing two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b>= 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that $b$-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance.