AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Mining User Intents in Twitter: A Semi-Supervised Approach to Inferring Intent Categories for Tweets

Wang, Jinpeng (Peking University) | Cong, Gao (Nanyang Technological University) | Zhao, Xin Wayne (Renmin University of China) | Li, Xiaoming (Peking University)

AAAI ConferencesMar-6-2015

In this paper, we propose to study the problem of identifying and classifying tweets into intent categories. For example, a tweet “I wanna buy a new car” indicates the user’s intent for buying a car. Identifying such intent tweets will have great commercial value among others. In particular, it is important that we can distinguish different types of intent tweets. We propose to classify intent tweets into six categories, namely Food & Drink, Travel, Career & Education, Goods & Services, Event and Activities and Trifle. We propose a semisupervised learning approach to categorizing intent tweets into the six categories.We construct a test collection by using a bootstrap method. Our experimental results show that our approach is effective in inferring intent categories for tweets.

information retrieval, machine learning, natural language, (21 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: Asia (0.15)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

On the Bayes-optimality of F-measure maximizers

Waegeman, Willem, Dembczynski, Krzysztof, Jachnik, Arkadiusz, Cheng, Weiwei, Hullermeier, Eyke

arXiv.org Machine LearningMar-6-2015

The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

1310.4849

Country:

Europe (0.92)
North America > United States (0.27)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
(3 more...)

Add feedback

Managing large-scale scientific hypotheses as uncertain and probabilistic data

Gonçalves, Bernardo

arXiv.org Artificial IntelligenceFeb-12-2015

In view of the paradigm shift that makes science ever more data-driven, in this thesis we propose a synthesis method for encoding and managing large-scale deterministic scientific hypotheses as uncertain and probabilistic data. In the form of mathematical equations, hypotheses symmetrically relate aspects of the studied phenomena. For computing predictions, however, deterministic hypotheses can be abstracted as functions. We build upon Simon's notion of structural equations in order to efficiently extract the (so-called) causal ordering between variables, implicit in a hypothesis structure (set of mathematical equations). We show how to process the hypothesis predictive structure effectively through original algorithms for encoding it into a set of functional dependencies (fd's) and then performing causal reasoning in terms of acyclic pseudo-transitive reasoning over fd's. Such reasoning reveals important causal dependencies implicit in the hypothesis predictive data and guide our synthesis of a probabilistic database. Like in the field of graphical models in AI, such a probabilistic database should be normalized so that the uncertainty arisen from competing hypotheses is decomposed into factors and propagated properly onto predictive data by recovering its joint probability distribution through a lossless join. That is motivated as a design-theoretic principle for data-driven hypothesis management and predictive analytics. The method is applicable to both quantitative and qualitative deterministic hypotheses and demonstrated in realistic use cases from computational science.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

1501.0529

Country: North America > United States (0.45)

Genre:

Research Report > Experimental Study (0.92)
Overview (0.92)
Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(3 more...)

Add feedback

Learning Rank Functionals: An Empirical Study

Tran, Truyen, Phung, Dinh, Venkatesh, Svetha

arXiv.org Machine LearningFeb-7-2015

Ranking is a key aspect of many applications, such as information retrieval, question answering, ad placement and recommender systems. Learning to rank has the goal of estimating a ranking model automatically from training data. In practical settings, the task often reduces to estimating a rank functional of an object with respect to a query. In this paper, we investigate key issues in designing an effective learning to rank algorithm. These include data representation, the choice of rank functionals, the design of the loss function so that it is correlated with the rank metrics used in evaluation. For the loss function, we study three techniques: approximating the rank metric by a smooth function, decomposition of the loss into a weighted sum of element-wise losses and into a weighted sum of pairwise losses. We then present derivations of piecewise losses using the theory of high-order Markov chains and Markov random fields. In experiments, we evaluate these design aspects on two tasks: answer ranking in a Social Question Answering site, and Web Information Retrieval.

information retrieval, machine learning, question answering, (17 more...)

arXiv.org Machine Learning

1407.6089

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)

Add feedback

Pairwise Rotation Hashing for High-dimensional Features

Ishikawa, Kohta, Sato, Ikuro, Ambai, Mitsuru

arXiv.org Machine LearningJan-29-2015

Binary Hashing is widely used for effective approximate nearest neighbors search. Even though various binary hashing methods have been proposed, very few methods are feasible for extremely high-dimensional features often used in visual tasks today. We propose a novel highly sparse linear hashing method based on pairwise rotations. The encoding cost of the proposed algorithm is $\mathrm{O}(n \log n)$ for n-dimensional features, whereas that of the existing state-of-the-art method is typically $\mathrm{O}(n^2)$. The proposed method is also remarkably faster in the learning phase. Along with the efficiency, the retrieval accuracy is comparable to or slightly outperforming the state-of-the-art. Pairwise rotations used in our method are formulated from an analytical study of the trade-off relationship between quantization error and entropy of binary codes. Although these hashing criteria are widely used in previous researches, its analytical behavior is rarely studied. All building blocks of our algorithm are based on the analytical solution, and it thus provides a fairly simple and efficient procedure.

information retrieval, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

1501.07422

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Report 79-30.pdf

AI ClassicsJan-25-2015, 21:11:42 GMT

An approach to query optimization is described that draws on two sources of knowledge: real world constraints on the values for the application domain served by the database; and knowledge about the current structure of the database and the cost of available retrieval processes. Real world knowledge is embodied in rules that are much like semantic integrity rules. The approach, called "query rephrasing", is to generate semantic equivalents of user queries that cost less to process than the original queries. The operation of a prototype system based on this approach is discussed in the context 0. simple queries which restrict a single file. The need for heuristics to limit the generation of equivalent queries is also discussed, and a method ut g "constraint thresholds" derived from

constraint, machine learning, united states department of defense, (28 more...)

AI Classics

Country: North America > United States (0.93)

Genre: Research Report (0.50)

Industry:

Transportation > Marine (0.46)
Transportation > Freight & Logistics Services > Shipping (0.46)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.56)

Add feedback

INFERENTIAL MEMORY AS THE BASIS OF MACHINES WHICH UNDERSTAND NATURAL LANGUAGE

AI ClassicsJan-25-2015, 20:32:44 GMT

Participants in the search for intelligent machines frequently disagree on a basic question of strategy in their quest. On the one hand there are those who believe that the major obstacles can be overcome by reliance on the computer's infallible memory, electronic speed, and arithmetic capabilities uig This report takes the position that immediate, practical applica can derive from the former approach, but the major problems will be "\ To mention a single example, the implementation f information retrieval techniques on present-day computers would be a large step forward, even though the techniques thus far considered have largely been conceptually trivial. Luhn (1958) has u sed a straightforward statistical procedure to extract key sentences from scientific articles, thus yielding useful abstracts of a sort. For even an unintelligent human does more than count frequencies or search for key words. The human displays intelligent features which are generally summed up by saying that he ...

machine learning, relation, united nations, (19 more...)

AI Classics

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.54)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.48)

Add feedback

COMPUTERS THOUGHT

AI ClassicsJan-25-2015, 20:31:47 GMT

The editors have provided him with special introductory sections as a ing guide to a thorough understand of the reports.

programming languages/compilers, united nations, university of pittsburgh, (125 more...)

AI Classics

Country:

North America > United States > New York (1.01)
Europe (1.00)
North America > United States > Michigan (0.67)
North America > United States > California > Los Angeles County (0.27)

Genre:

Personal > Interview (1.00)
Overview (1.00)
Financial News (1.00)
(2 more...)

Industry:

Law (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.01)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.01)
Information Technology > Artificial Intelligence > History (1.00)
(21 more...)

Add feedback

22 Question Answering BONNIE WEBBER AND NICK WEBB

AI ClassicsJan-25-2015, 20:26:41 GMT

Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. First, questions do not often translate into a simple list of keywords. For example, the question (1) Which countries did the pope visit in the 1960s? A much more complex set of keywords is needed in order to get anywhere close to the intended result, and experience shows that people will not learn how to formulate and use such sets. Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents. While this is not much of a burden when the answer appears in a snippet and further document access is unnecessary, QA technology aims to move this from being an accidental property of search to its focus. In keyword search and in much work to date on QA technology, the information seeking process has been seen as a one-shot affair: the user asks a question, and the system provides a satisfactory response. However, early work on QA (Section 1.1) did not make this assumption, and newly targeted applications are hindered by it: while a user may try to formulate a question whose answer is the information Question Answering 631 they want, they will not know whether they have succeeded until something has been returned for examination. If what is returned is unsatisfactory or, while not the answer, is still of interest, a user needs to be able to ask further questions that are understood in the context of the previous ones. For these target applications, QA must be part of a collaborative search process (Section 3.3). In the rest of this section, we give some historical background on QA systems (Section 1.1), on dialogue systems in which QA has played a significant role (Section 1.2), and on a particular QA task that has been a major driver of the field over the past 8 years (Section 1.3). Section 2 describes the current state of the art in QA systems, organized around the de facto architecture of such systems. Section 3 discusses some current directions in which QA is moving, including the development of interactive QA.

alphabet inc., machine learning, question answering, (33 more...)

AI Classics

Country:

North America > United States (1.00)
Europe (0.93)
Asia (0.68)

Genre:

Personal (0.46)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Health & Medicine (1.00)
Consumer Products & Services (0.67)
Government > Regional Government > > > > > > > North America Government (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Revisiting Kernelized Locality-Sensitive Hashing for Improved Large-Scale Image Retrieval

Jiang, Ke, Que, Qichao, Kulis, Brian

arXiv.org Machine LearningNov-15-2014

We present a simple but powerful reinterpretation of kernelized locality-sensitive hashing (KLSH), a general and popular method developed in the vision community for performing approximate nearest-neighbor searches in an arbitrary reproducing kernel Hilbert space (RKHS). Our new perspective is based on viewing the steps of the KLSH algorithm in an appropriately projected space, and has several key theoretical and practical benefits. First, it eliminates the problematic conceptual difficulties that are present in the existing motivation of KLSH. Second, it yields the first formal retrieval performance bounds for KLSH. Third, our analysis reveals two techniques for boosting the empirical performance of KLSH. We evaluate these extensions on several large-scale benchmark image retrieval data sets, and show that our analysis leads to improved recall performance of at least 12%, and sometimes much higher, over the standard KLSH method.

kernel, klsh, vector, (12 more...)

arXiv.org Machine Learning

1411.4199

Country:

North America > United States > Ohio (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)

Add feedback