AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Exploiting Social Annotation for Automatic Resource Discovery

Plangprasopchok, Anon, Lerman, Kristina

arXiv.org Artificial IntelligenceApr-12-2007

Information integration applications, such as mediators or mashups, that require access to information resources currently rely on users manually discovering and integrating them in the application. Manual resource discovery is a slow process, requiring the user to sift through results obtained via keyword-based search. Although search methods have advanced to include evidence from document contents, its metadata and the contents and link structure of the referring pages, they still do not adequately cover information sources -- often called "the hidden Web"-- that dynamically generate documents in response to a query. The recently popular social bookmarking sites, which allow users to annotate and share metadata about various information sources, provide rich evidence for resource discovery. In this paper, we describe a probabilistic model of the user annotation process in a social bookmarking system del.icio.us. We then use the model to automatically find resources relevant to a particular information domain. Our experimental results on data obtained from del.icio.us

artificial intelligence, information retrieval, natural language, (20 more...)

arXiv.org Artificial Intelligence

0704.1675

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
(2 more...)

Add feedback

A Comparison of Different Machine Transliteration Models

Oh, J., Choi, K., Isahara, H.

Journal of Artificial Intelligence ResearchOct-18-2006

Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models -- grapheme-based transliteration model, phoneme-based transliteration model, hybrid transliteration model, and correspondence-based transliteration model -- have been proposed by several researchers. To date, however, there has been little research on a framework in which multiple transliteration models can operate simultaneously. Furthermore, there has been no comparison of the four models within the same framework and using the same data. We addressed these problems by 1) modeling the four models within the same framework, 2) comparing them under the same conditions, and 3) developing a way to improve machine transliteration through this comparison. Our comparison showed that the hybrid and correspondence-based models were the most effective and that the four models can be used in a complementary manner to improve machine transliteration performance.

grapheme, transliteration, transliteration model, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1999

AI Access Foundation

10468

Journal of Artificial Intelligence Research

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

An Expressive Language and Efficient Execution System for Software Agents

Barish, G., Knoblock, C. A.

Journal of Artificial Intelligence ResearchJun-1-2005

Software agents can be used to automate many of the tedious, time-consuming information processing tasks that humans currently have to complete manually. However, to do so, agent plans must be capable of representing the myriad of actions and control flows required to perform those tasks. In addition, since these tasks can require integrating multiple sources of remote information ? typically, a slow, I/O-bound process ? it is desirable to make execution as efficient as possible. To address both of these needs, we present a flexible software agent plan language and a highly parallel execution system that enable the efficient execution of expressive agent plans. The plan language allows complex tasks to be more easily expressed by providing a variety of operators for flexibly processing the data as well as supporting subplans (for modularity) and recursion (for indeterminate looping). The executor is based on a streaming dataflow model of execution to maximize the amount of operator and data parallelism possible at runtime. We have implemented both the language and executor in a system called THESEUS. Our results from testing THESEUS show that streaming dataflow execution can yield significant speedups over both traditional serial (von Neumann) as well as non-streaming dataflow-style execution that existing software and robot agent execution systems currently support. In addition, we show how plans written in the language we present can represent certain types of subtasks that cannot be accomplished using the languages supported by network query engines. Finally, we demonstrate that the increased expressivity of our plan language does not hamper performance; specifically, we show how data can be integrated from multiple remote sources just as efficiently using our architecture as is possible with a state-of-the-art streaming-dataflow network query engine.

execution, operator, relation, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1548

AI Access Foundation

10411

Journal of Artificial Intelligence Research

Country:

North America > United States > Washington > King County > Seattle (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Middle East > Egypt > North Sinai Governorate > Arish (0.05)
(24 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Government > Regional Government > North America Government > United States Government (0.92)
Consumer Products & Services (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)

Add feedback

LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

Erkan, G., Radev, D. R.

Journal of Artificial Intelligence ResearchDec-1-2004

We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.

centrality, lexrank, summarization, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1523

AI Access Foundation

10396

Journal of Artificial Intelligence Research

Country:

Europe > United Kingdom (0.28)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)
(15 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Government > Military (0.93)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)

Add feedback

Generalizing Boolean Satisfiability I: Background and Survey of Existing Work

Dixon, H. E., Ginsberg, M. L., Parkes, A. J.

Journal of Artificial Intelligence ResearchFeb-1-2004

This is the first of three planned papers describing ZAP, a satisfiability engine that substantially generalizes existing tools while retaining the performance characteristics of modern high-performance solvers. The fundamental idea underlying ZAP is that many problems passed to such engines contain rich internal structure that is obscured by the Boolean representation used; our goal is to define a representation in which this structure is apparent and can easily be exploited to improve computational performance. This paper is a survey of the work underlying ZAP, and discusses previous attempts to improve the performance of the Davis-Putnam-Logemann-Loveland algorithm by exploiting the structure of the problem being solved. We examine existing ideas including extensions of the Boolean language to allow cardinality constraints, pseudo-Boolean representations, symmetry, and a limited form of quantification. While this paper is intended as a survey, our research results are contained in the two subsequent articles, with the theoretical structure of ZAP described in the second paper in this series, and ZAP's implementation described in the third.

constraint, symmetry, unit propagation, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1353

AI Access Foundation

10369

Journal of Artificial Intelligence Research

Country:

North America > United States > Oregon > Lane County > Eugene (0.28)
Europe > Finland > Uusimaa > Helsinki (0.04)
Europe > Austria > Tyrol > Innsbruck (0.04)
(11 more...)

Genre: Overview (0.87)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
(4 more...)

Add feedback

Mean Field Approach to a Probabilistic Model in Information Retrieval

Wu, Bin, Wong, K., Bodoff, David

Neural Information Processing SystemsDec-31-2003

We study an explicit parametric model of documents, queries, and relevancy assessment for Information Retrieval (IR). Mean-field methods are applied to analyze the model and derive efficient practical algorithms to estimate the parameters in the problem. The hyperparameters are estimated by a fast approximate leave-one-out cross-validation procedure based on the cavity method. The algorithm is further evaluated on several benchmark databases by comparing with standard algorithms in IR.

document and query, estimation, hyperparameter, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.06)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.52)

Add feedback

Mean Field Approach to a Probabilistic Model in Information Retrieval

Wu, Bin, Wong, K., Bodoff, David

Neural Information Processing SystemsDec-31-2003

document and query, estimation, hyperparameter, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.06)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.52)

Add feedback

Mean Field Approach to a Probabilistic Model in Information Retrieval

Wu, Bin, Wong, K., Bodoff, David

Neural Information Processing SystemsDec-31-2003

We study an explicit parametric model of documents, queries, and relevancy assessmentfor Information Retrieval (IR). Mean-field methods are applied to analyze the model and derive efficient practical algorithms to estimate the parameters in the problem. The hyperparameters are estimated bya fast approximate leave-one-out cross-validation procedure based on the cavity method. The algorithm is further evaluated on several benchmark databases by comparing with standard algorithms in IR.

information retrieval, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.52)

Add feedback

IJCAI-03 Conference Highlights

Hedberg, Sara R.

AI MagazineDec-15-2003

This summer's AI conference in Acapulco offered attendees wide variety of program choices as well as ample time to catch up with friends and colleagues. For many, scheduling time was probably the biggest challenge because the conference included numerous invited speakers, 189 technical paper presentations, 93 posters, a Mobile Robot Competition, 19 Innovative Applications of AI (IAAI) award-winning paper presentations, a Trading Agents Competition, a special track on AI and the web, and the vendor exhibit.

artificial intelligence, information retrieval, natural language, (14 more...)

AI Magazine

Country:

North America > United States > Texas (0.15)
North America > United States > New York (0.14)

Genre: Press Release (0.47)

Industry: Banking & Finance > Trading (0.90)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.31)

Add feedback

The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank

Richardson, Matthew, Domingos, Pedro

Neural Information Processing SystemsDec-31-2002

Traditional information retrieval techniques can give poor results on the Web, with its vast scale and highly variable content quality. Recently, however, it was found that Web search results can be much improved by using the information contained in the link structure between pages. The two best-known algorithms which do this are HITS [1] and PageRank [2]. The latter is used in the highly successful Google search engine [3]. The heuristic underlying both of these approaches is that pages with many inlinks are more likely to be of high quality than pages with few inlinks, given that the author of a page will presumably include in it links to pages that s/he believes are of high quality.

pagerank, query, surfer, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Industry: Information Technology (0.34)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback