AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Scalable Probabilistic Databases with Factor Graphs and MCMC

Wick, Michael, McCallum, Andrew, Miklau, Gerome

arXiv.org Artificial IntelligenceMay-11-2010

Probabilistic databases play a crucial role in the management and understanding of uncertain data. However, incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power, scalability, or restrict the class of relational algebra formula under which they are closed. We propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over possible worlds; Markov chain Monte Carlo (MCMC) inference is then used to recover this uncertainty to a desired level of fidelity. Our approach allows the efficient evaluation of arbitrary queries over probabilistic databases with arbitrary dependencies expressed by graphical models with structure that changes during inference. MCMC sampling provides efficiency by hypothesizing {\em modifications} to possible worlds rather than generating entire worlds from scratch. Queries are then run over the portions of the world that change, avoiding the onerous cost of running full queries over each sampled world. A significant innovation of this work is the connection between MCMC sampling and materialized view maintenance techniques: we find empirically that using view maintenance techniques is several orders of magnitude faster than naively querying each sampled world. We also demonstrate our system's ability to answer relational queries with aggregation, and demonstrate additional scalability through the use of parallelization.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

1005.1934

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Add feedback

Learning Better Context Characterizations: An Intelligent Information Retrieval Approach

Lorenzetti, Carlos M., Maguitman, Ana G.

arXiv.org Artificial IntelligenceApr-27-2010

This paper proposes an incremental method that can be used by an intelligent system to learn better descriptions of a thematic context. The method starts with a small number of terms selected from a simple description of the topic under analysis and uses this description as the initial search context. Using these terms, a set of queries are built and submitted to a search engine. New documents and terms are used to refine the learned vocabulary. Evaluations performed on a large number of topics indicate that the learned vocabulary is much more effective than the original one at the time of constructing queries to retrieve relevant material.

information retrieval, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

1004.3478

Country:

North America > United States > District of Columbia > Washington (0.04)
South America > Argentina (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Enriching a News Portal with Semantic Information: An Entity-Based Approach

Bocconi, Stefano (Elsevier Labs) | Fogarolli, Angela (University of Trento)

AAAI ConferencesMar-22-2010

In this paper we describe the production and consumption of linked data in the scenario of the Italian news agency ANSA portal. The goal of the use-case is to provide viewers of a news item with background information and links to related news articles contained on the portal. This information enrichment process is entity-based: ANSA news archive is analyzed using Name Entity Recognition, and each detected entity is annotated with a unique identifier. These identifiers are obtained using the Entity Name Server developed within the scope of the OKKAM European project. Subsequently the news are published on the portal using RDFa and linked to a semantic search engine that provides background information harvested from sources such as DBpedia and links to additional news sources. The presented project has the potential to contribute to Linked Data by creating and publishing a large quantity of entities and assertions about them coming from the ANSA news archive.

identifier, information, news item, (15 more...)

AAAI Conferences

2010 AAAI Spring Symposium Series

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)

Industry: Media > News (0.68)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.51)

Add feedback

Linked Data Integration for Semantic Dialogue and Backend Access

Sonntag, Daniel (German Research Center for AI (DFKI)) | Kiesel, Malte (German Research Center for AI (DFKI))

AAAI ConferencesMar-22-2010

Over the last several years, the market for speech technology has seen significant developments (Pieraccini and Huerta We learned some lessons which we use as guidelines 2005) and powerful commercial off-the-shelf solutions for in the development of multimodal dialogue systems where speech recognition (ASR) or speech synthesis (TTS). Further users can combine speech and gestures when using multiple application scenarios, more diverse and dynamic information interaction devices. In earlier projects (Wahlster 2003; Reithinger sources, and more complex prototype systems need et al. 2005) we integrated different sub-components to be addressed in the context of QA. Dialogue-based QA allows to multimodal interaction systems. Other lessons served as a user to pose questions in natural speech, followed by guidelines in the development of semantic dialogue systems answers presented in a concise form (Sonntag et al. 2007).

artificial intelligence, information retrieval, natural language, (19 more...)

AAAI Conferences

2010 AAAI Spring Symposium Series

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)
Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Communications > Web > Semantic Web (0.70)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

Improving Relevancy Accessing Linked Opinion Data

Galitsky, Boris (University of Girona) | Rosa, Josep Lluis de la (University of Girona) | Dobrocsi, Gábor (University of Miskolc)

AAAI ConferencesMar-22-2010

We introduce a search engine and information retrieval system for providing access to linked opinion data. Natural language technology of generalization of syntactic parse trees is introduced as a similarity measure between subjects of textual opinions to link them on the fly. Information extraction algorithm for automatic summarization of web pages in the format of Google sponsored links is presented. We outline the usability of the implemented system, integrated opinion delivery environment (IODE).

artificial intelligence, information retrieval, natural language, (18 more...)

AAAI Conferences

2010 AAAI Spring Symposium Series

Country:

Europe > Spain > Catalonia > Girona Province > Girona (0.04)
Europe > Hungary > Borsod-Abaúj-Zemplén County > Miskolc (0.04)
Oceania > Australia (0.04)
(4 more...)

Industry:

Banking & Finance (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.90)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

The Web as a Privacy Lab

Chow, Richard (PARC) | Fang, Ji (PARC) | Golle, Philippe (PARC) | Staddon, Jessica (PARC)

AAAI ConferencesMar-22-2010

The privacy dangers of data proliferation on the Web are well-known. Information on the Web has facilitated the deanonymization of anonymous bloggers, the de-sanitization of government records and the identification of individuals based on search engine queries. What has received less attention is Web-mining in support of privacy. In this position paper we argue that the very ability ofWeb data to breach privacy demonstrates its value as a laboratory for the detection of privacy breaches before they happen. In addition, we argue that privacy-invasive services may become privacy-respecting by mining publicly available Web data, with little decrease in performance and efficiency.

data mining, information retrieval, machine learning, (18 more...)

AAAI Conferences

2010 AAAI Spring Symposium Series

Country:

Europe > Greece > Attica > Athens (0.05)
North America > United States > New York > New York County > New York City (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.95)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
(3 more...)

Add feedback

From Frequency to Meaning: Vector Space Models of Semantics

Turney, P. D., Pantel, P.

Journal of Artificial Intelligence ResearchFeb-27-2010

Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2934

AI Access Foundation

10640

Journal of Artificial Intelligence Research

Country:

North America > United States > Ohio > Franklin County > Columbus (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(46 more...)

Genre: Overview (1.00)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.93)
Government (0.67)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Random Indexing K-tree

De Vries, Christopher M., De Vine, Lance, Geva, Shlomo

arXiv.org Artificial IntelligenceFeb-1-2010

The purpose of this paper is to present and analyse the combination of Random Indexing (RI) with the K-tree algorithm. Both RI and K-tree adapt to changing data and decrease the cost of computationally intensive vector based applications. This combination is particularly suitable to the representation and clustering of very large document collections. Documents are typically represented in vector space as very sparse high dimensional vectors. RI can reduce the dimensionality and sparsity of this representation. In turn, the condensed representation is highly effective when working with K-tree. The paper is focused on determining the effectiveness of using RI with K-tree through experiments and comparative analysis of results. Sections 2 to 6 discuss K-tree, Random Indexing, Document Representation, Experimental Setup and Experimental results respectively. The paper ends with a conclusion in Section 7.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

1001.0833

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)

Add feedback

Estimating Robust Query Models with Convex Optimization

Collins-thompson, Kevyn

Neural Information Processing SystemsDec-31-2009

Query expansion is a long-studied approach for improving retrieval effectiveness by enhancing the user's original query with additional related words. Current algorithms for automatic query expansion can often improve retrieval accuracy on average, but are not robust: that is, they are highly unstable and have poor worst-case performance for individual queries. To address this problem, we introduce anovel formulation of query expansion as a convex optimization problem over a word graph. The model combines initial weights from a baseline feedback algorithmwith edge weights based on word similarity, and integrates simple constraints to enforce set-based criteria such as aspect balance, aspect coverage, and term centrality. Results across multiple standard test collections show consistent andsignificant reductions in the number and magnitude of expansion failures, while retaining the strong positive gains of the baseline algorithm. Our approach does not assume a particular retrieval model, making it applicable to a broad class of existing expansion algorithms.

artificial intelligence, information retrieval, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Unsupervised Learning of Visual Sense Models for Polysemous Words

Saenko, Kate, Darrell, Trevor

Neural Information Processing SystemsDec-31-2009

Polysemy is a problem for methods that exploit image search engines to build object category models. Existing unsupervised approaches do not take word sense into consideration. We propose a new method that uses a dictionary to learn models of visual word sense from a large collection of unlabeled web data. The use of LDA to discover a latent sense space makes the model robust despite the very limited nature of dictionary definitions. The definitions are used to learn a distribution in the latent space that best represents a sense. The algorithm then uses the text surrounding image links to retrieve images with high probability of a particular dictionary sense. An object classifier is trained on the resulting sense-specific images. We evaluate our method on a dataset obtained by searching the web for polysemous words. Category classification experiments show that our dictionary-based approach outperforms baseline methods.

information retrieval, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)

Add feedback