AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

A Graph-Enhanced Click Model for Web Search

Lin, Jianghao, Liu, Weiwen, Dai, Xinyi, Zhang, Weinan, Li, Shuai, Tang, Ruiming, He, Xiuqiang, Hao, Jianye, Yu, Yong

arXiv.org Artificial IntelligenceAug-22-2022

To better exploit search logs and model users' behavior patterns, numerous click models are proposed to extract users' implicit interaction feedback. Most traditional click models are based on the probabilistic graphical model (PGM) framework, which requires manually designed dependencies and may oversimplify user behaviors. Recently, methods based on neural networks are proposed to improve the prediction accuracy of user behaviors by enhancing the expressive ability and allowing flexible dependencies. However, they still suffer from the data sparsity and cold-start problems. In this paper, we propose a novel graph-enhanced click model (GraphCM) for web search. Firstly, we regard each query or document as a vertex, and propose novel homogeneous graph construction methods for queries and documents respectively, to fully exploit both intra-session and inter-session information for the sparsity and cold-start problems. Secondly, following the examination hypothesis, we separately model the attractiveness estimator and examination predictor to output the attractiveness scores and examination probabilities, where graph neural networks and neighbor interaction techniques are applied to extract the auxiliary information encoded in the pre-constructed homogeneous graphs. Finally, we apply combination functions to integrate examination probabilities and attractiveness scores into click predictions. Extensive experiments conducted on three real-world session datasets show that GraphCM not only outperforms the state-of-art models, but also achieves superior performance in addressing the data sparsity and cold-start problems.

click model, cold-start problem, information, (16 more...)

arXiv.org Artificial Intelligence

2206.08621

Country:

North America > Canada (0.05)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

[100%OFF] Search Engine Optimization Complete Specialization Course

#artificialintelligenceAug-21-2022, 12:48:47 GMT

Welcome to the World's best specialized SEO course ever. This is the only course in the world where you woll also learn about the technicalities of SEO and how to handle them. The content of this course is based on real world practices and checklists used by professionals in the SEO world. The content of the course focuses on giving the idea of how any SEO agency or freelancer approaches to any website and start the SEO to rank any particular keyword. You will understand how the SEO activities affect the website in terms of visibility by Search Engine.

engine optimization complete specialization course, search engine optimization

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.65)

Add feedback

Learning to Rank with Small Set of Ground Truth Data

Wu, Jiashu

arXiv.org Artificial IntelligenceAug-21-2022

Over the past decades, researchers had put lots of effort investigating ranking techniques used to rank query results retrieved during information retrieval, or to rank the recommended products in recommender systems. In this project, we aim to investigate searching, ranking, as well as recommendation techniques to help to realize a university academia searching platform. Unlike the usual information retrieval scenarios where lots of ground truth ranking data is present, in our case, we have only limited ground truth knowledge regarding the academia ranking. For instance, given some search queries, we only know a few researchers who are highly relevant and thus should be ranked at the top, and for some other search queries, we have no knowledge about which researcher should be ranked at the top at all. The limited amount of ground truth data makes some of the conventional ranking techniques and evaluation metrics become infeasible, and this is a huge challenge we faced during this project. This project enhances the user's academia searching experience to a large extent, it helps to achieve an academic searching platform which includes researchers, publications and fields of study information, which will be beneficial not only to the university faculties but also to students' research experiences.

knowledge base, matrix, publication, (14 more...)

arXiv.org Artificial Intelligence

2207.01188

Country: Oceania > New Zealand (0.04)

Genre: Research Report (1.00)

Industry: Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.67)

Add feedback

Merchandise Recommendation for Retail Events with Word Embedding Weighted Tf-idf and Dynamic Query Expansion

Yuan, Ted Tao, Zhang, Zezhong

arXiv.org Artificial IntelligenceAug-17-2022

We rank all we rely on item retrieval from marketplace inventory. With retrieved items by the sum of tf-idf scores from matched words, feedback to expand query scope, we discuss keyword expansion and keep the items with total tf-idf scores above a threshold. The candidate selection using word embedding similarity, and an retrieval based system works well to discover relevant enhanced tf-idf formula for expanded words in search ranking.

keyword, merchandise recommendation, word embedding weighted tf-idf, (7 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3209978.3210202

2208.08581

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.15)
North America > United States > California > Santa Clara County > San Jose (0.05)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.46)

Add feedback

Temporal Concept Drift and Alignment: An empirical approach to comparing Knowledge Organization Systems over time

Grabus, Sam, Logan, Peter Melville, Greenberg, Jane

arXiv.org Artificial IntelligenceAug-16-2022

This research explores temporal concept drift and temporal alignment in knowledge organization systems (KOS). A comparative analysis is pursued using the 1910 Library of Congress Subject Headings, 2020 FAST Topical, and automatic indexing. The use case involves a sample of 90 nineteenth-century Encyclopedia Britannica entries. The entries were indexed using two approaches: 1) full-text indexing; 2) Named Entity Recognition was performed upon the entries with Stanza, Stanford's NLP toolkit, and entities were automatically indexed with the Helping Interdisciplinary Vocabulary application (HIVE), using both 1910 LCSH and FAST Topical. The analysis focused on three goals: 1) identifying results that were exclusive to the 1910 LCSH output; 2) identifying terms in the exclusive set that have been deprecated from the contemporary LCSH, demonstrating temporal concept drift; and 3) exploring the historical significance of these deprecated terms. Results confirm that historical vocabularies can be used to generate anachronistic subject headings representing conceptual drift across time in KOS and historical resources. A methodological contribution is made demonstrating how to study changes in KOS over time and improve the contextualization of historical humanities resources.

fast topical, temporal concept drift, vocabulary version, (14 more...)

arXiv.org Artificial Intelligence

2208.07835

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(6 more...)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.54)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)

Add feedback

GreenDB -- A Dataset and Benchmark for Extraction of Sustainability Information of Consumer Goods

Jäger, Sebastian, Flick, Alexander, Garcia, Jessica Adriana Sanchez, Driesch, Kaspar von den, Brendel, Karl, Biessmann, Felix

arXiv.org Artificial IntelligenceAug-16-2022

The production, shipping, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Machine Learning (ML) can help to foster sustainable consumption patterns by accounting for sustainability aspects in product search or recommendations of modern retail platforms. However, the lack of large high quality publicly available product data with trustworthy sustainability information impedes the development of ML technology that can help to reach our sustainability goals. Here we present GreenDB, a database that collects products from European online shops on a weekly basis. As proxy for the products' sustainability, it relies on sustainability labels, which are evaluated by experts. The GreenDB schema extends the well-known schema.org Product definition and can be readily integrated into existing product catalogs. We present initial results demonstrating that ML models trained with our data can reliably (F1 score 96%) predict the sustainability label of products. These contributions can help to complement existing e-commerce experiences and ultimately encourage users to more sustainable consumption patterns.

dataset and benchmark, greendb, sustainability information, (10 more...)

arXiv.org Artificial Intelligence

2207.10733

Country:

Europe (0.29)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Baltimore (0.04)
(3 more...)

Genre: Research Report (0.71)

Industry: Consumer Products & Services (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Add feedback

CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks

Chen, Jiangui, Zhang, Ruqing, Guo, Jiafeng, Liu, Yiqun, Fan, Yixing, Cheng, Xueqi

arXiv.org Artificial IntelligenceAug-16-2022

Knowledge-intensive language tasks (KILT) usually require a large body of information to provide correct answers. A popular paradigm to solve this problem is to combine a search system with a machine reader, where the former retrieves supporting evidences and the latter examines them to produce answers. Recently, the reader component has witnessed significant advances with the help of large-scale pre-trained generative models. Meanwhile most existing solutions in the search component rely on the traditional ``index-retrieve-then-rank'' pipeline, which suffers from large memory footprint and difficulty in end-to-end optimization. Inspired by recent efforts in constructing model-based IR models, we propose to replace the traditional multi-step search pipeline with a novel single-step generative model, which can dramatically simplify the search process and be optimized in an end-to-end manner. We show that a strong generative retrieval model can be learned with a set of adequately designed pre-training tasks, and be adopted to improve a variety of downstream KILT tasks with further fine-tuning. We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index. Empirical results show that CorpusBrain can significantly outperform strong baselines for the retrieval task on the KILT benchmark and establish new state-of-the-art downstream performances. We also show that CorpusBrain works well under zero- and low-resource settings.

corpusbrain, document identifier, retrieval, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3511808.3557271

2208.07652

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.05)
Asia > China > Beijing > Beijing (0.05)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Law (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.93)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evaluating Dense Passage Retrieval using Transformers

Sadri, Nima

arXiv.org Artificial IntelligenceAug-14-2022

Although representational retrieval models based on Transformers have been able to make major advances in the past few years, and despite the widely accepted conventions and best-practices for testing such models, a $\textit{standardized}$ evaluation framework for testing them has not been developed. In this work, we formalize the best practices and conventions followed by researchers in the literature, paving the path for more standardized evaluations - and therefore more fair comparisons between the models. Our framework (1) embeds the documents and queries; (2) for each query-document pair, computes the relevance score based on the dot product of the document and query embedding; (3) uses the $\texttt{dev}$ set of the MSMARCO dataset to evaluate the models; (4) uses the $\texttt{trec_eval}$ script to calculate MRR@100, which is the primary metric used to evaluate the models. Most importantly, we showcase the use of this framework by experimenting on some of the most well-known dense retrieval models.

dataset, dense passage retrieval, msmarco dataset, (13 more...)

arXiv.org Artificial Intelligence

2208.06959

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)

Add feedback

Disentangled Modeling of Domain and Relevance for Adaptable Dense Retrieval

Zhan, Jingtao, Ai, Qingyao, Liu, Yiqun, Mao, Jiaxin, Xie, Xiaohui, Zhang, Min, Ma, Shaoping

arXiv.org Artificial IntelligenceAug-11-2022

Recent advance in Dense Retrieval (DR) techniques has significantly improved the effectiveness of first-stage retrieval. Trained with large-scale supervised data, DR models can encode queries and documents into a low-dimensional dense space and conduct effective semantic matching. However, previous studies have shown that the effectiveness of DR models would drop by a large margin when the trained DR models are adopted in a target domain that is different from the domain of the labeled data. One of the possible reasons is that the DR model has never seen the target corpus and thus might be incapable of mitigating the difference between the training and target domains. In practice, unfortunately, training a DR model for each target domain to avoid domain shift is often a difficult task as it requires additional time, storage, and domain-specific data labeling, which are not always available. To address this problem, in this paper, we propose a novel DR framework named Disentangled Dense Retrieval (DDR) to support effective and flexible domain adaptation for DR models. DDR consists of a Relevance Estimation Module (REM) for modeling domain-invariant matching patterns and several Domain Adaption Modules (DAMs) for modeling domain-specific features of multiple target corpora. By making the REM and DAMs disentangled, DDR enables a flexible training paradigm in which REM is trained with supervision once and DAMs are trained with unsupervised data. Comprehensive experiments in different domains and languages show that DDR significantly improves ranking performance compared to strong DR baselines and substantially outperforms traditional retrieval methods in most scenarios.

dr model, retrieval, target domain, (14 more...)

arXiv.org Artificial Intelligence

2208.05753

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Google hit by worldwide outage as users report search engine down

The GuardianAug-9-2022, 02:28:21 GMT

Google experienced a major international internet outage on Tuesday, technology platforms reported. The realtime online platform Downdetector reported users had registered problems with Google explorer, the world's dominant search engine from 2.12am BST (9.12pm EST, 11.12AM AEST. As of 11.38AM, there had been 4,113 confirmed reports of Google outages. User reports indicate Google is having problems since 9:12 PM EDT. Users said sister platforms Gmail, Google maps and Google images were also experiencing problems.

google, outage, search engine, (2 more...)

The Guardian

Country:

South America (0.07)
Oceania > Australia (0.07)
North America > United States (0.07)
(6 more...)

Industry: Information Technology > Services (0.79)

Technology:

Information Technology > Information Management > Search (0.86)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.73)

Add feedback