AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

A Survey of Data Mining Techniques for Social Media Analysis

Adedoyin-Olowe, Mariam, Gaber, Mohamed Medhat, Stahl, Frederic

arXiv.org Artificial IntelligenceApr-16-2014

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.46298/jdmdh.5

1312.4617

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Pennsylvania (0.04)
(4 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
(6 more...)

Add feedback

Natural Language Access to Enterprise Data

Waltinger, Ulli (Siemens AG) | Tecuci, Dan (Siemens Corporation) | Olteanu, Mihaela (Siemens AG) | Mocanu, Vlad (Siemens AG) | Sullivan, Sean (Siemens Energy Inc.)

AI MagazineApr-4-2014

This paper describes USI Answers — a natural language question answering system for enterprise data. We report on the progress towards the goal of offering easy access to enterprise data to a large number of business users, most of whom are not familiar with the specific syntax or semantics of the underlying data sources. Additional complications come from the nature of the data, which comes both as structured and unstructured. The proposed solution allows users to express questions in natural language, makes apparent the system's interpretation of the query, and allows easy query adjustment and reformulation. The application is in use by more than 1500 users from Siemens Energy. We evaluate our approach on a data set consisting of fleet data.

artificial intelligence, natural language, question answering, (19 more...)

AI Magazine

Country: North America > United States > California (0.28)

Genre: Personal (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)
(4 more...)

Add feedback

Toward computational cumulative biology by combining models of biological datasets

Faisal, Ali, Peltonen, Jaakko, Georgii, Elisabeth, Rung, Johan, Kaski, Samuel

arXiv.org Machine LearningApr-1-2014

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to both include biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer and the model-based search was more accurate than keyword search; it moreover recovered biologically meaningful relationships that are not straightforwardly visible from annotations, for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

doi: 10.1371/journal.pone.0113053

1404.0329

Country:

Europe (0.93)
North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(4 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
(2 more...)

Add feedback

Counterfactual Estimation and Optimization of Click Metrics for Search Engines

Li, Lihong, Chen, Shunbao, Kleban, Jim, Gupta, Ankur

arXiv.org Machine LearningMar-12-2014

Optimizing an interactive system against a predefined online metric is particularly challenging, when the metric is computed from user feedback such as clicks and payments. The key challenge is the counterfactual nature: in the case of Web search, any change to a component of the search engine may result in a different search result page for the same query, but we normally cannot infer reliably from search log how users would react to the new result page. Consequently, it appears impossible to accurately estimate online metrics that depend on user feedback, unless the new engine is run to serve users and compared with a baseline in an A/B test. This approach, while valid and successful, is unfortunately expensive and time-consuming. In this paper, we propose to address this problem using causal inference techniques, under the contextual-bandit framework. This approach effectively allows one to run (potentially infinitely) many A/B tests offline from search log, making it possible to estimate and optimize online metrics quickly and inexpensively. Focusing on an important component in a commercial search engine, we show how these ideas can be instantiated and applied, and obtain very promising results that suggest the wide applicability of these techniques.

artificial intelligence, information retrieval, natural language, (19 more...)

arXiv.org Machine Learning

1403.1891

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)

Add feedback

Pareto-depth for Multiple-query Image Retrieval

Hsiao, Ko-Jen, Calder, Jeff, Hero, Alfred O. III

arXiv.org Machine LearningFeb-20-2014

Most content-based image retrieval systems consider either one single query, or multiple queries that include the same object or represent the same semantic information. In this paper we consider the content-based image retrieval problem for multiple query images corresponding to different image semantics. We propose a novel multiple-query information retrieval algorithm that combines the Pareto front method (PFM) with efficient manifold ranking (EMR). We show that our proposed algorithm outperforms state of the art multiple-query retrieval algorithms on real-world image databases. We attribute this performance improvement to concavity properties of the Pareto fronts, and prove a theoretical result that characterizes the asymptotic concavity of the fronts.

data mining, information retrieval, machine learning, (20 more...)

arXiv.org Machine Learning

doi: 10.1109/TIP.2014.2378057

1402.5176

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search

Shrivastava, Anshumali, Li, Ping

Neural Information Processing SystemsDec-31-2013

We go beyond the notion of pairwise similarity and look into search problems with $k$-way similarity functions. In this paper, we focus on problems related to \emph{3-way Jaccard} similarity: $\mathcal{R}^{3way}= \frac{|S_1 \cap S_2 \cap S_3|}{|S_1 \cup S_2 \cup S_3|}$, $S_1, S_2, S_3 \in \mathcal{C}$, where $\mathcal{C}$ is a size $n$ collection of sets (or binary vectors). We show that approximate $\mathcal{R}^{3way}$ similarity search problems admit fast algorithms with provable guarantees, analogous to the pairwise case. Our analysis and speedup guarantees naturally extend to $k$-way resemblance. In the process, we extend traditional framework of \emph{locality sensitive hashing (LSH)} to handle higher order similarities, which could be of independent theoretical interest. The applicability of $\mathcal{R}^{3way}$ search is shown on the Google sets" application. In addition, we demonstrate the advantage of $\mathcal{R}^{3way}$ resemblance over the pairwise case in improving retrieval quality."

log 1, resemblance, similarity, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
North America > United States > Texas > Dallas County > Dallas (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
(9 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Which Space Partitioning Tree to Use for Search?

Ram, Parikshit, Gray, Alexander

Neural Information Processing SystemsDec-31-2013

We consider the task of nearest-neighbor search with the class of binary-space-partitioning trees, which includes kd-trees, principal axis trees and random projection trees, and try to rigorously answer the question which tree to use for nearest-neighbor search?'' To this end, we present the theoretical results which imply that trees with better vector quantization performance have better search performance guarantees. We also explore another factor affecting the search performance -- margins of the partitions in these trees. We demonstrate, both theoretically and empirically, that large margin partitions can improve the search performance of a space-partitioning tree. "

information retrieval, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)

Add feedback

ARTigo: Building an Artwork Search Engine With Games and Higher-Order Latent Semantic Analysis

Wieser, Christoph (University of Munich) | Bry, François (University of Munich) | Bérard, Alexandre ( Institut National des Sciences Appliquées Rennes ) | Lagrange, Richard ( Institut National des Sciences Appliquées Rennes )

AAAI ConferencesNov-5-2013

This article describes how a semantic search engine has been build from, and still is continuously improved by, a semantic analysis of the “footprints” left by players on the gaming Web platform ARTigo. The Web platform offers several Games With a Purpose (GWAPs) some of which have been specifically designed to collect the data needed for building the artwork search engine. ARTigo is a “tagging ecosystem” of games that cooperate so as to gather a wide range of information on artworks. The ARTigo ecosystem generates a folksonomy saved as 3rd-order tensor, that is a generalization of a matrix, the three orders or dimensions of which represent (1) who (2) tagged an (3) an artwork. The semantic search engine is build using a non-trivial generalization of the well-known, matrix-based, Latent Semantic Analysis (LSA) methods and algorithms. ARTigo is in service for five years and is subject to an active research constantly resulting in new developments, some of which are reported about for the first time in this article.

artificial intelligence, information retrieval, natural language, (4 more...)

AAAI Conferences

First AAAI Conference on Human Computation and Crowdsourcing

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

A Framework for Adaptive Crowd Query Processing

Trushkowsky, Beth (University of California, Berkeley) | Kraska, Tim (Brown University) | Franklin, Michael J. (University of California, Berkeley)

AAAI ConferencesNov-5-2013

Search engines can yield poor results for information retrieval tasks when they cannot interpret query predicates. Such predicates are better left for humans to evaluate. We propose an adaptive processing framework for deciding (a) which parts of a query should be processed by machines and (b) the order the crowd should process the remaining parts, optimizing for result quality and processing cost. We describe an algorithm and experimental results for the first framework component.

adaptive crowd query processing, artificial intelligence, information retrieval, (1 more...)

AAAI Conferences

First AAAI Conference on Human Computation and Crowdsourcing

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.53)

Add feedback

DataSift: An Expressive and Accurate Crowd-Powered Search Toolkit

Parameswaran, Aditya (Stanford University) | Teh, Ming Han (Stanford University) | Garcia-Molina, Hector (Stanford University) | Widom, Jennifer (Stanford University)

AAAI ConferencesNov-5-2013

Traditional information retrieval systems have limited functionality. For instance, they are not able to adequately support queries containing non-textual fragments such as images or videos, queries that are very long or ambiguous, or semantically-rich queries over non-textual corpora. In this paper, we present DataSift, an expressive and accurate crowd-powered search toolkit that can connect to any corpus. We provide a number of alternative configurations for DataSift using crowdsourced and automated components, and demonstrate gains of 2–3x on precision over traditional retrieval schemes using experiments on real corpora. We also present our results on determining suitable values for parameters in those configurations, along with a number of interesting insights learned along the way.

accurate crowd-powered search toolkit, artificial intelligence, natural language, (2 more...)

AAAI Conferences

First AAAI Conference on Human Computation and Crowdsourcing

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.53)

Add feedback