AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Ranking Tweets by Labeled and Collaboratively Selected Pairs with Transitive Closure

Liu, Shenghua (Chinese Academy of Sciences) | Cheng, Xueqi (Chinese Academy of Sciences) | Li, Fangtao (Google Inc.)

AAAI ConferencesJul-14-2014

Tweets ranking is important for information acquisition in Microblog. Due to the content sparsity and lackof labeled data, it is better to employ semi-supervisedlearning methods to utilize the unlabeled data. However,most of previous semi-supervised learning methods donot consider the pair conflict problem, which means thatthe new selected unlabeled data may conflict with the labeled and previously selected data. It will hurt the learning performance a lot, if the training data contains manyconflict pairs. In this paper, we propose a new collaborative semi-supervised SVM ranking model (CSR-TC)with consideration of the order conflict. The unlabeleddata is selected based on a dynamically maintained transitive closure graph to avoid pair conflict. We also investigate the two views of features, intrinsic and contentrelevant features, for the proposed model. Extensive experiments are conducted on TREC Microblogging corpus. The results demonstrate that our proposed methodachieves significant improvement, compared to severalstate-of-the-art models.

information retrieval, machine learning, natural language, (19 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Sports > Soccer (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.76)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

Compact Aspect Embedding for Diversified Query Expansions

Liu, Xiaohua (University of Montreal) | Bouchoucha, Arbi (University of Montreal) | Sordoni, Alessandro (University of Montreal) | Nie, Jian-Yun (University of Montreal)

AAAI ConferencesJul-14-2014

Diversified query expansion (DQE) based approaches aim to select a set of expansion terms with less redundancy among them while covering as many query aspects as possible. Recently they have experimentally demonstrate their effectiveness for the task of search result diversification. One challenge faced by existing DQE approaches is how to ensure the aspect coverage. In this paper, we propose a novel method for DQE, called compact aspect embedding, which exploits trace norm regularization to learn a low rank vector space for the query, with each eigenvector of the learnt vector space representing an aspect, and the absolute value of its corresponding eigenvalue representing the association strength of that aspect to the query. Meanwhile, each expansion term is mapped into the vector space as well. Based on this novel representation of the query aspects and expansion terms, we design a greedy selection strategy to choose a set of expansion terms to explicitly cover all possible aspects of the query.We test our method on several TREC diversification data sets, and show that our method significantly outperforms the state-of-the-art search result diversification approaches.

expansion term, machine learning, natural language, (19 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country: North America > Canada (0.28)

Genre:

Research Report > Experimental Study (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.71)

Add feedback

Fraudulent Support Telephone Number Identification Based on Co-Occurrence Information on the Web

Li, Xin (Tsinghua University) | Liu, Yiqun (Tsinghua University) | Zhang, Min (Tsinghua University) | Ma, Shaoping (Tsinghua University)

AAAI ConferencesJul-14-2014

"Fraudulent support phones" refers to the misleading telephone numbers placed on Web pages or other media that claim to provide services with which they are not associated. Most fraudulent support phone information is found on search engine result pages (SERPs), and such information substantially degrades the search engine user experience. In this paper, we propose an approach to identify fraudulent support telephone numbers on the Web based on the co-occurrence relations between telephone numbers that appear on SERPs. We start from a small set of seed official support phone numbers and seed fraudulent numbers. Then, we construct a co-occurrence graph according to the co-occurrence relationships of the telephone numbers that appear on Web pages. Additionally, we take the page layout information into consideration on the assumption that telephone numbers that appear in nearby page blocks should be regarded as more closely related. Finally, we develop a propagation algorithm to diffuse the trust scores of seed official support phone numbers and the distrust scores of the seed fraudulent numbers on the co-occurrence graph to detect additional fraudulent numbers. Experimental results based on over 1.5 million SERPs produced by a popular Chinese commercial search engine indicate that our approach outperforms TrustRank, Anti-TrustRank and Good-Bad Rank algorithms by achieving an AUC value of over 0.90.

information retrieval, machine learning, natural language, (18 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country: Asia > China (0.28)

Industry: Information Technology > Security & Privacy (0.95)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Certain Answers as Objects and Knowledge

Libkin, Leonid (University of Edinburgh)

AAAI ConferencesJul-1-2014

The standard way of answering queries over incomplete databases is to compute certain answers, defined as the intersection of query answers on all complete databases that the incomplete database represents. But is this universally accepted definition correct? We argue that this "one-size-fits-all" definition can often lead to counterintuitive or just plain wrong results, and propose an alternative framework for defining certain answers. The idea of the framework is to move away from the standard, in the database literature, assumption that query results be given in the form of a database object, and to allow instead two alternative representations of answers: as objects defining all other answers, or as knowledge we can deduce with certainty about all such answers. We show that the latter is often easier to achieve than the former, that in general certain answers need not be defined as intersection, and may well contain missing information in them. We also show that with a proper choice of semantics, we can often reduce computing certain answers - as either objects or knowledge - to standard query evaluation. We describe the framework in the most general way, applicable to a variety of data models, and test it on three concrete relational semantics of incompleteness: open, closed, and weak closed world.

certain answer, object and knowledge

AAAI Conferences

Fourteenth International Conference on the Principles of Knowledge Representation and Reasoning

Technology:

Information Technology > Databases (0.87)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.53)

Add feedback

Flexible Integration of Planning and Information Gathering

Camacho, David (Universidad Autonoma de Madrid) | Borrajo, Daniel (Universidad Autonoma de Madrid) | Molina, José M. (Universidad Autonoma de Madrid) | Aler, Ricardo (Universidad Autonoma de Madrid)

AAAI ConferencesJun-9-2014

The evolution of the electronic sources connected through wide area networks like Internet has encouraged the development of new information gathering techniques that go beyond traditional information retrieval and WEB search methods. They use advanced techniques, like planning or constraint programming, to integrate and reason about hetereogeneous information sources. In this paper we describe MAPWEB. MAPWEB is a multiagent framework that integrates planning agents and WEB information retrieval agents. The goal of this framework is to deal with problems that require planning with information to be gathered from the WEB. MAPWEB decouples planning from information gathering, by splitting a planning problem into two parts: solving an abstract problem and validating and completing the abstract solutions by means of information gathering. This decoupling allows also to address an important aspect of information gathering: the WEB is a dynamic medium and more and more companies make their information available in the WEB everyday. The MAPWEB framework can be adapted quickly to these changes by just modifying the planning domain and adding the required information gathering agents. For instance, in a travel assistant domain, if taxi companies begin to offer WEB information, it would only be necessary to add new planning operators related to traveling by taxi, for a more complete travel domain. This paper describes the MAPWEB planning process, focusing on the aforementioned flexibility aspect.

artificial intelligence, information retrieval, natural language, (3 more...)

AAAI Conferences

Sixth European Conference on Planning

Industry: Consumer Products & Services > Travel (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.87)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.53)

Add feedback

Querying Geometric Figures Using a Controlled Language, Ontological Graphs and Dependency Lattices

Haralambous, Yannis, Quaresma, Pedro

arXiv.org Artificial IntelligenceMay-13-2014

Dynamic geometry systems (DGS) have become basic tools in many areas of geometry as, for example, in education. Geometry Automated Theorem Provers (GATP) are an active area of research and are considered as being basic tools in future enhanced educational software as well as in a next generation of mechanized mathematics assistants. Recently emerged Web repositories of geometric knowledge, like TGTP and Intergeo, are an attempt to make the already vast data set of geometric knowledge widely available. Considering the large amount of geometric information already available, we face the need of a query mechanism for descriptions of geometric constructions. In this paper we discuss two approaches for describing geometric figures (declarative and procedural), and present algorithms for querying geometric figures in declaratively and procedurally described corpora, by using a DGS or a dedicated controlled natural language for queries.

logic & formal reasoning, natural language, relation, (17 more...)

arXiv.org Artificial Intelligence

1403.2194

Country:

Europe > Portugal > Coimbra > Coimbra (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > France (0.04)

Genre:

Instructional Material (0.49)
Research Report (0.40)

Industry: Education (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.48)

Add feedback

Thresholding Classifiers to Maximize F1 Score

Lipton, Zachary Chase, Elkan, Charles, Narayanaswamy, Balakrishnan

arXiv.org Machine LearningMay-13-2014

This paper provides new insight into maximizing F1 scores in the context of binary classification and also in the context of multilabel classification. The harmonic mean of precision and recall, F1 score is widely used to measure the success of a binary classifier when one class is rare. Micro average, macro average, and per instance average F1 scores are used in multilabel classification. For any classifier that produces a real-valued output, we derive the relationship between the best achievable F1 score and the decision-making threshold that achieves this optimum. As a special case, if the classifier outputs are well-calibrated conditional probabilities, then the optimal threshold is half the optimal F1 score. As another special case, if the classifier is completely uninformative, then the optimal behavior is to classify all examples as positive. Since the actual prevalence of positive examples typically is low, this behavior can be considered undesirable. As a case study, we discuss the results, which can be surprising, of applying this procedure when predicting 26,853 labels for Medline documents.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1402.1892

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

SemMemDB: In-Database Knowledge Activation

Chen, Yang (University of Florida) | Petrovic, Milenko (Florida Institute for Human and Machine Cognition) | Clark, Micah (Florida Institute for Human and Machine Cognition)

AAAI ConferencesMay-7-2014

Semantic networks are a popular way of simulating human memory in ACT-R-like cognitive architectures. However, existing implementations fall short in their ability to efficiently work with very large networks required for full-scale simulations of human memories. In this paper, we present SemMemDB, an in-database realization of semantic networks and spreading activation. We describe a relational representation for semantic networks and an efficient SQL-based spreading activation algorithm. We provide a simple interface for users to invoke retrieval queries. The key benefits of our approach are: (1) Databases have mature query engines and optimizers that generate efficient query plans for memory activation and retrieval; (2) Databases can provide massive storage capacity to potentially support human-scale memories; (3) Spreading activation is implemented in SQL, a widely-used query language for big data analytics. We evaluate SemMemDB in a comprehensive experimental study using DBPedia, a web-scale ontology constructed from the Wikipedia corpus. The results show that our system runs over 500 times faster than previous works.

in-database knowledge activation, semmemdb

AAAI Conferences

The Twenty-Seventh International Flairs Conference

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.53)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.53)

Add feedback

Comparative Evaluation of Link-Based Approaches for Candidate Ranking in Link-to-Wikipedia Systems

Fernandez Garcia, N., Arias Fisteus, J., Sanchez Fernandez, L.

Journal of Artificial Intelligence ResearchApr-30-2014

In recent years, the task of automatically linking pieces of text (anchors) mentioned in a document to Wikipedia articles that represent the meaning of these anchors has received extensive research attention. Typically, link-to-Wikipedia systems try to find a set of Wikipedia articles that are candidates to represent the meaning of the anchor and, later, rank these candidates to select the most appropriate one. In this ranking process the systems rely on context information obtained from the document where the anchor is mentioned and/or from Wikipedia. In this paper we center our attention in the use of Wikipedia links as context information. In particular, we offer a review of several candidate ranking approaches in the state-of-the-art that rely on Wikipedia link information. In addition, we provide a comparative empirical evaluation of the different approaches on five different corpora: the TAC 2010 corpus and four corpora built from actual Wikipedia articles and news items.

information, proceedings, query, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4129

AI Access Foundation

10877

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(3 more...)

Add feedback

Fast Exact Search in Hamming Space with Multi-Index Hashing

Norouzi, Mohammad, Punjani, Ali, Fleet, David J.

arXiv.org Artificial IntelligenceApr-24-2014

There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

1307.2982

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.92)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback