AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

NewsFinder: Automating an AI News Service

Eckroth, Joshua (The Ohio State University) | Dong, Liang (Clemson University) | Smith, Reid G. (Marathon Oil Corporation) | Buchanan, Bruce G. (University of Pittsburgh)

AI MagazineJul-1-2012

NewsFinder automates the steps involved in finding, selecting, categorizing, and publishing news stories that meet relevance criteria for the Artificial Intelligence community. The software combines a broad search of online news sources with topic-specific trained models and heuristics. Since August 2010, the program has been used to operate the AI in the News service that is part of the AAAI AITopics website.

data mining, information retrieval, machine learning, (19 more...)

AI Magazine

Country: North America > United States (1.00)

Genre:

Research Report (0.68)
Personal (0.46)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
(2 more...)

Add feedback

Online Structured Prediction via Coactive Learning

Shivaswamy, Pannaga, Joachims, Thorsten

arXiv.org Artificial IntelligenceJun-27-2012

We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. At each step, the system (e.g. search engine) receives a context (e.g. query) and predicts an object (e.g. ranking). The user responds by correcting the system if necessary, providing a slightly improved -- but not necessarily optimal -- object as feedback. We argue that such feedback can often be inferred from observable user behavior, for example, from clicks in web-search. Evaluating predictions by their cardinal utility to the user, we propose efficient learning algorithms that have ${\cal O}(\frac{1}{\sqrt{T}})$ average regret, even though the learning algorithm never observes cardinal utility values as in conventional online learning. We demonstrate the applicability of our model and learning algorithms on a movie recommendation task, as well as ranking for web-search.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

1205.4213

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Film (0.35)
Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.83)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.69)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.65)

Add feedback

On the Difficulty of Nearest Neighbor Search

He, Junfeng, Kumar, Sanjiv, Chang, Shih-Fu

arXiv.org Machine LearningJun-27-2012

Fast approximate nearest neighbor (NN) search in large databases is becoming popular. Several powerful learning-based formulations have been proposed recently. However, not much attention has been paid to a more fundamental question: how difficult is (approximate) nearest neighbor search in a given data set? And which data properties affect the difficulty of nearest neighbor search and how? This paper introduces the first concrete measure called Relative Contrast that can be used to evaluate the influence of several crucial data characteristics such as dimensionality, sparsity, and database size simultaneously in arbitrary normed metric spaces. Moreover, we present a theoretical analysis to prove how the difficulty measure (relative contrast) determines/affects the complexity of Local Sensitive Hashing, a popular approximate NN search method. Relative contrast also provides an explanation for a family of heuristic hashing algorithms with good practical performance based on PCA. Finally, we show that most of the previous works in measuring NN search meaningfulness/difficulty can be derived as special asymptotic cases for dense vectors of the proposed measure.

information retrieval, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

1206.6411

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Keyphrase Based Arabic Summarizer (KPAS)

El-Shishtawy, Tarek, El-Ghannam, Fatma

arXiv.org Artificial IntelligenceJun-23-2012

This paper describes a computationally inexpensive and efficient generic summarization algorithm for Arabic texts. The algorithm belongs to extractive summarization family, which reduces the problem into representative sentences identification and extraction sub-problems. Important keyphrases of the document to be summarized are identified employing combinations of statistical and linguistic features. The sentence extraction algorithm exploits keyphrases as the primary attributes to rank a sentence. The present experimental work, demonstrates different techniques for achieving various summarization goals including: informative richness, coverage of both main and auxiliary topics, and keeping redundancy to a minimum. A scoring scheme is then adopted that balances between these summarization goals. To evaluate the resulted Arabic summaries with well-established systems, aligned English/Arabic texts are used through the experiments.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1206.5384

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.05)
(7 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

Comparison-Based Learning with Rank Nets

Karbasi, Amin, Ioannidis, Stratis, Massoulie, laurent

arXiv.org Machine LearningJun-18-2012

We consider the problem of search through comparisons, where a user is presented with two candidate objects and reveals which is closer to her intended target. We study adaptive strategies for finding the target, that require knowledge of rank relationships but not actual distances between objects. We propose a new strategy based on rank nets, and show that for target distributions with a bounded doubling constant, it finds the target in a number of comparisons close to the entropy of the target distribution and, hence, of the optimum. We extend these results to the case of noisy oracles, and compare this strategy to prior art over multiple datasets.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1206.4674

Country:

Europe (0.93)
North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.31)

Add feedback

Bayesian Locality Sensitive Hashing for Fast Similarity Search

Satuluri, Venu, Parthasarathy, Srinivasan

arXiv.org Artificial IntelligenceMar-28-2012

Given a collection of objects and an associated similarity measure, the all-pairs similarity search problem asks us to find all pairs of objects with similarity greater than a certain user-specified threshold. Locality-sensitive hashing (LSH) based methods have become a very popular approach for this problem. However, most such methods only use LSH for the first phase of similarity search - i.e. efficient indexing for candidate generation. In this paper, we present BayesLSH, a principled Bayesian algorithm for the subsequent phase of similarity search - performing candidate pruning and similarity estimation using LSH. A simpler variant, BayesLSH-Lite, which calculates similarities exactly, is also presented. BayesLSH is able to quickly prune away a large majority of the false positive candidate pairs, leading to significant speedups over baseline approaches. For BayesLSH, we also provide probabilistic guarantees on the quality of the output, both in terms of accuracy and recall. Finally, the quality of BayesLSH's output can be easily tuned and does not require any manual setting of the number of hashes to use for similarity estimation, unlike standard approaches. For two state-of-the-art candidate generation algorithms, AllPairs and LSH, BayesLSH enables significant speedups, typically in the range 2x-20x for a wide variety of datasets.

data mining, information retrieval, machine learning, (20 more...)

arXiv.org Artificial Intelligence

1110.1328

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Ohio (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)

Add feedback

SearchBuddies: Bringing Search Engines into the Conversation

Hecht, Brent (Northwestern University) | Teevan, Jaime (Microsoft Research) | Morris, Meredith Ringel (Microsoft Research) | Liebling, Dan (Microsoft Research)

AAAI ConferencesFeb-22-2012

Although people receive trusted, personalized recommendations and auxiliary social benefits when they ask questions of their friends, using a search engine is often a more effective way to find an answer. Attempts to integrate social and algorithmic search have thus far focused on bringing social content into algorithmic search results. However, more of the benefits of social search can be preserved by reversing this approach and bringing algorithmic content into natural question-based conversations. To do this successfully, it is necessary to adapt search engine interaction to a social context. In this paper, we present SearchBuddies, a system that responds to Facebook status message questions with algorithmic search results. Via a three-month deployment of the system to 122 social network users, we explore how people responded to search content in a highly social environment. Our experience deploying SearchBuddies shows that a socially embedded search engine can successfully provide users with unique and highly relevant information in a social context and can be integrated into conversations around an information need. The deployment also illuminates specific challenges of embedding a search engine in a social environment and provides guidance toward solutions.

information, search engine, searchbuddy, (12 more...)

AAAI Conferences

Sixth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > Hawaii (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > Illinois > Cook County > Evanston (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

So.cl: An Interest Network for Informal Learning

Farnham, Shelly Diane (Microsoft Research) | Lahav, Michal (Microsoft Research) | Raskino, David (Microsoft Research) | Cheng, Lili (Microsoft Research) | Laird-McConnell, Tom (Microsoft Research)

AAAI ConferencesFeb-22-2012

Web search engines emerged prior to the dominance of social media. What if we imagined search as integrating with social media from the ground up? So.cl is a web application that combines web browsing, search, and social networking for the purposes of sharing and learning around topics of interest. In this paper, we present the results of a deployment study examining existing learning practices around search and social networking for students, and how these practices shifted when participants adopted So.cl. We found prior to using So.cl that students already heavily employed search tools and social media for learning. With the use of So.cl, we found that users engaged in lightweight, fun social sharing and learning for informal, personal topics, but not for more heavyweight collaboration around school or work. The public nature of So.cl encouraged users to post search results as much for self-expression as for searching, enabling serendipitous discovery around interests.

information, participant, student, (15 more...)

AAAI Conferences

Sixth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Washington > King County > Redmond (0.04)
(6 more...)

Genre:

Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (0.96)
Research Report > New Finding (0.68)

Industry:

Education (1.00)
Information Technology > Services (0.95)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Add feedback

Transductive Learning for Real-Time Twitter Search

Zhang, Xin (Graduate University of Chinese Academy of Sciences) | He, Ben (Graduate University of Chinese Academy of Sciences) | Luo, Tiejian (Graduate University of Chinese Academy of Sciences)

AAAI ConferencesFeb-22-2012

Recency is an important dimension of relevance for real-time Twitter search as users tend to be interested in fresh news and events. By incorporating various sources of evidence, the application of learning to rank (LTR) algorithms to real-time Twitter search has shown beneficial in finding not only relevant, but also recent tweets in response to given queries. However, the potential effectiveness brought by LTR may not have been fully exploited due to the lack of labeled data available for properly learning a ranking model, since human labels are expensive in real-world applications. To this end, this paper proposes a transductive algorithm that incrementally aggregate the labeled tweets through an iterative process. Experimental results on the standard Tweets11 dataset show that our approach is able to outperform strong baselines without the use of human labels.

information retrieval, machine learning, natural language, (15 more...)

AAAI Conferences

Sixth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > New York (0.04)
Asia > China > Beijing > Beijing (0.04)

Industry: Information Technology (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.31)

Add feedback

Enhancing Event Descriptions through Twitter Mining

Tanev, Hristo (Joint Research Centre, European Commission) | Ehrmann, Maud (Joint Research Centre, European Commission) | Piskorski, Jakub (Frontex) | Zavarella, Vanni (Joint Research Centre, European Commission)

AAAI ConferencesFeb-22-2012

We describe a simple IR approach for linking news about events, detected by an event extraction system, to messages from Twitter (tweets). In particular, we explore several methods for creating event-specific queries for Twitter and provide a quantitative and qualitative evaluation of the relevance and usefulness of the information obtained from the tweets. We showed that methods based on utilization of word co-occurrence clustering, domain-specific keywords and named entity recognition improve the performance with respect to a basic approach.

information retrieval, natural language, tweet, (19 more...)

AAAI Conferences

Sixth International AAAI Conference on Weblogs and Social Media

Country:

Asia > Middle East > Yemen (0.06)
Europe > Italy (0.05)
North America > United States > Oklahoma (0.04)
(4 more...)

Industry:

Government > Military (0.49)
Information Technology > Services (0.49)
Media > News (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.58)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.50)

Add feedback