AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Entity Disambiguation via Fusion Entity Decoding

Wang, Junxiong, Mousavi, Ali, Attia, Omar, Pradeep, Ronak, Potdar, Saloni, Rush, Alexander M., Minhas, Umar Farooq, Li, Yunyao

arXiv.org Artificial IntelligenceMay-7-2024

Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training and inefficient generation. Most importantly, entity descriptions, which could contain crucial information to distinguish similar entities from each other, are often overlooked. We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions. Given text and candidate entities, the encoder learns interactions between the text and each candidate entity, producing representations for each entity candidate. The decoder then fuses the representations of entity candidates together and selects the correct entity. Our experiments, conducted on various entity disambiguation benchmarks, demonstrate the strong and robust performance of this model, particularly +1.5% in the ZELDA benchmark compared with GENRE. Furthermore, we integrate this approach into the retrieval/reader framework and observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.

benchmark, dataset, disambiguation, (14 more...)

arXiv.org Artificial Intelligence

2404.01626

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Dominican Republic (0.04)
Europe > Slovenia (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
(2 more...)

Add feedback

Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization

Zamani, Hamed, Bendersky, Michael

arXiv.org Artificial IntelligenceMay-5-2024

This paper introduces Stochastic RAG--a novel approach for end-to-end optimization of retrieval-augmented generation (RAG) models that relaxes the simplifying assumptions of marginalization and document independence, made in most prior work. Stochastic RAG casts the retrieval process in RAG as a stochastic sampling without replacement process. Through this formulation, we employ straight-through Gumbel-top-k that provides a differentiable approximation for sampling without replacement and enables effective end-to-end optimization for RAG. We conduct extensive experiments on seven diverse datasets on a wide range of tasks, from open-domain question answering to fact verification to slot-filling for relation extraction and to dialogue systems. By applying this optimization method to a recent and effective RAG model, we advance state-of-the-art results on six out of seven datasets.

computational linguistic, dataset, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2405.02816

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.05)
(13 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
(2 more...)

Add feedback

Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

Zhao, Xinran, Chen, Tong, Chen, Sihao, Zhang, Hongming, Wu, Tongshuang

arXiv.org Artificial IntelligenceMay-4-2024

The task of Information Retrieval (IR) requires a system to identify relevant documents based on users' information needs. In real-world scenarios, retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. For example, when asked to verify a claim, a retrieval system is expected to identify evidence from both supporting vs. contradicting perspectives, for the downstream system to make a fair judgment call. In this work, we study whether retrievers can recognize and respond to different perspectives of the queries -- beyond finding relevant documents for a claim, can retrievers distinguish supporting vs. opposing documents? We reform and extend six existing tasks to create a benchmark for retrieval, where we have diverse perspectives described in free-form text, besides root, neutral queries. We show that current retrievers covered in our experiments have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives. Motivated by the observation, we further explore the potential to leverage geometric features of retriever representation space to improve the perspective awareness of retrievers in a zero-shot manner. We demonstrate the efficiency and effectiveness of our projection-based methods on the same set of tasks. Further analysis also shows how perspective awareness improves performance on various downstream tasks, with 4.2% higher accuracy on AmbigQA and 29.9% more correlation with designated viewpoints on essay writing, compared to non-perspective-aware baselines.

news article, query, retriever, (15 more...)

arXiv.org Artificial Intelligence

2405.02714

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Brazil (0.05)
Asia > India (0.05)
(43 more...)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.34)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SUKHSANDESH: An Avatar Therapeutic Question Answering Platform for Sexual Education in Rural India

Singh, Salam Michael, Garg, Shubhmoy Kumar, Misra, Amitesh, Seth, Aaditeshwar, Chakraborty, Tanmoy

arXiv.org Artificial IntelligenceMay-3-2024

Sexual education aims to foster a healthy lifestyle in terms of emotional, mental and social well-being. In countries like India, where adolescents form the largest demographic group, they face significant vulnerabilities concerning sexual health. Unfortunately, sexual education is often stigmatized, creating barriers to providing essential counseling and information to this at-risk population. Consequently, issues such as early pregnancy, unsafe abortions, sexually transmitted infections, and sexual violence become prevalent. Our current proposal aims to provide a safe and trustworthy platform for sexual education to the vulnerable rural Indian population, thereby fostering the healthy and overall growth of the nation. In this regard, we strive towards designing SUKHSANDESH, a multi-staged AI-based Question Answering platform for sexual education tailored to rural India, adhering to safety guardrails and regional language support. By utilizing information retrieval techniques and large language models, SUKHSANDESH will deliver effective responses to user queries. We also propose to anonymise the dataset to mitigate safety measures and set AI guardrails against any harmful or unwanted response generation. Moreover, an innovative feature of our proposal involves integrating ``avatar therapy'' with SUKHSANDESH. This feature will convert AI-generated responses into real-time audio delivered by an animated avatar speaking regional Indian languages. This approach aims to foster empathy and connection, which is particularly beneficial for individuals with limited literacy skills. Partnering with Gram Vaani, an industry leader, we will deploy SUKHSANDESH to address sexual education needs in rural India.

india, qa system, sexual education, (15 more...)

arXiv.org Artificial Intelligence

2405.01858

Country:

Asia > India > NCT > Delhi (0.05)
Asia > Middle East > Jordan (0.04)
Asia > India > Maharashtra (0.04)
(17 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

Semi-Parametric Retrieval via Binary Token Index

Zhou, Jiawei, Dong, Li, Wei, Furu, Chen, Lei

arXiv.org Artificial IntelligenceMay-3-2024

The landscape of information retrieval has broadened from search services to a critical component in various advanced applications, where indexing efficiency, cost-effectiveness, and freshness are increasingly important yet remain less explored. To address these demands, we introduce Semi-parametric Vocabulary Disentangled Retrieval (SVDR). SVDR is a novel semi-parametric retrieval framework that supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval. In our evaluation on three open-domain question answering benchmarks with the entire Wikipedia as the retrieval corpus, SVDR consistently demonstrates superiority. It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and an 9% higher top-1 accuracy compared to BM25 when using a binary token index. Specifically, the adoption of a binary token index reduces index preparation time from 30 GPU hours to just 2 CPU hours and storage size from 31 GB to 2 GB, achieving a 90% reduction compared to an embedding-based index.

arxiv preprint arxiv, representation, retrieval, (13 more...)

arXiv.org Artificial Intelligence

2405.01924

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.83)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

TOPICAL: TOPIC Pages AutomagicaLly

Giorgi, John, Singh, Amanpreet, Downey, Doug, Feldman, Sergey, Wang, Lucy Lu

arXiv.org Artificial IntelligenceMay-2-2024

Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article. Automated creation of topic pages would enable their rapid curation as information resources, providing an alternative to traditional web search. While most prior work has focused on generating topic pages about biographical entities, in this work, we develop a completely automated process to generate high-quality topic pages for scientific entities, with a focus on biomedical concepts. We release TOPICAL, a web app and associated open-source code, comprising a model pipeline combining retrieval, clustering, and prompting, that makes it easy for anyone to generate topic pages for a wide variety of biomedical entities on demand. In a human evaluation of 150 diverse topic pages generated using TOPICAL, we find that the vast majority were considered relevant, accurate, and coherent, with correct supporting citations. We make all code publicly available and host a free-to-use web app at: https://s2-topical.apps.allenai.org

evaluation, information, literature, (16 more...)

arXiv.org Artificial Intelligence

2405.01796

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.94)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
(2 more...)

Add feedback

Was YOUR Google down? Search engine hit with more than one-hour outage that impacted users worldwide

Daily Mail - Science & techMay-1-2024, 17:19:31 GMT

Google was down for more than one hour on Wednesday. Users in the US, the UK, Australia, parts of Europe, South America and Asia reported problems with Search, the website and Google Drive. It is unclear how many users were impacted and what caused the glitch. DownDetector's outage map for the US highlighted that users had reported problems in New York City, San Francisco and parts of the Midwest. In the UK, Glasgow and Cambridge was also in the red - but America appeared to be feeling more of the outage than other nations. Americans reported that they were seeing a server error when attempting to connect to Chrome, which is also lagging for some users.

artificial intelligence, information retrieval, natural language, (8 more...)

Daily Mail - Science & tech

Country:

South America (0.27)
Oceania > Australia (0.27)
North America > United States > New York (0.27)
(3 more...)

Industry: Information Technology > Services (0.95)

Technology:

Information Technology > Information Management > Search (0.85)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Google is DOWN! World's biggest search engine hit by outage plaguing thousands of users across the globe

Daily Mail - Science & techMay-1-2024, 15:47:28 GMT

Google has been hit with a worldwide outage that is impacting thousands of users. DownDetector shows issues appeared around 11am ET, plaguing search, the website and Google Drive. Users in the US, the UK, Australia, parts of Europe, South America and Asia have reported problems with the tech giant's services. It is unclear how many users have been impacted and what caused the glitch. DownDetector's outage map for the US shows users have reported problems in New York City, San Francisco and parts of the Midwest.

artificial intelligence, information retrieval, natural language, (11 more...)

Daily Mail - Science & tech

Country:

Oceania > Australia (0.31)
Asia (0.31)
South America (0.28)
(3 more...)

Industry: Information Technology > Services (0.48)

Technology:

Information Technology > Information Management > Search (0.40)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval

Lawrie, Dawn, Kayi, Efsun, Yang, Eugene, Mayfield, James, Oard, Douglas W.

arXiv.org Artificial IntelligenceMay-1-2024

PLAID, an efficient implementation of the ColBERT late interaction bi-encoder using pretrained language models for ranking, consistently achieves state-of-the-art performance in monolingual, cross-language, and multilingual retrieval. PLAID differs from ColBERT by assigning terms to clusters and representing those terms as cluster centroids plus compressed residual vectors. While PLAID is effective in batch experiments, its performance degrades in streaming settings where documents arrive over time because representations of new tokens may be poorly modeled by the earlier tokens used to select cluster centroids. PLAID Streaming Hierarchical Indexing that Runs on Terabytes of Temporal Text (PLAID SHIRTTT) addresses this concern using multi-phase incremental indexing based on hierarchical sharding. Experiments on ClueWeb09 and the multilingual NeuCLIR collection demonstrate the effectiveness of this approach both for the largest collection indexed to date by the ColBERT architecture and in the multilingual setting, respectively.

plaid shirttt, proceedings, shard, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3626772.3657964

2405.00975

Country:

North America > United States > Maryland > Baltimore (0.15)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > District of Columbia > Washington (0.05)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.31)

Add feedback

QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims

V, Venktesh, Anand, Abhijit, Anand, Avishek, Setty, Vinay

arXiv.org Artificial IntelligenceMay-1-2024

Automated fact checking has gained immense interest to tackle the growing misinformation in the digital era. Existing systems primarily focus on synthetic claims on Wikipedia, and noteworthy progress has also been made on real-world claims. In this work, we release QuanTemp, a diverse, multi-domain dataset focused exclusively on numerical claims, encompassing temporal, statistical and diverse aspects with fine-grained metadata and an evidence collection without leakage. This addresses the challenge of verifying real-world numerical claims, which are complex and often lack precise information, not addressed by existing works that mainly focus on synthetic claims. We evaluate and quantify the limitations of existing solutions for the task of verifying numerical claims. We also evaluate claim decomposition based methods, numerical understanding based models and our best baselines achieves a macro-F1 of 58.32. This demonstrates that QuanTemp serves as a challenging evaluation set for numerical claim verification.

dataset, numerical claim, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2403.17169

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > District of Columbia > Washington (0.05)
Europe > Netherlands > South Holland > Delft (0.04)
(31 more...)

Genre: Research Report > Experimental Study (0.68)

Industry:

Government (1.00)
Education (0.67)
Media > News (0.66)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(3 more...)

Add feedback