AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System for Multilingual Named Entity Recognition

Tan, Zeqi, Huang, Shen, Jia, Zixia, Cai, Jiong, Li, Yinghui, Lu, Weiming, Zhuang, Yueting, Tu, Kewei, Xie, Pengjun, Huang, Fei, Jiang, Yong

arXiv.org Artificial IntelligenceMay-16-2023

The MultiCoNER \RNum{2} shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios, and it inherits the semantic ambiguity and low-context setting of the MultiCoNER \RNum{1} task. To cope with these problems, the previous top systems in the MultiCoNER \RNum{1} either incorporate the knowledge bases or gazetteers. However, they still suffer from insufficient knowledge, limited context length, single retrieval strategy. In this paper, our team \textbf{DAMO-NLP} proposes a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER. We perform error analysis on the previous top systems and reveal that their performance bottleneck lies in insufficient knowledge. Also, we discover that the limited context length causes the retrieval knowledge to be invisible to the model. To enhance the retrieval context, we incorporate the entity-centric Wikidata knowledge base, while utilizing the infusion approach to broaden the contextual scope of the model. Also, we explore various search strategies and refine the quality of retrieval knowledge. Our system\footnote{We will release the dataset, code, and scripts of our system at {\small \url{https://github.com/modelscope/AdaSeq/tree/master/examples/U-RaNER}}.} wins 9 out of 13 tracks in the MultiCoNER \RNum{2} shared task. Additionally, we compared our system with ChatGPT, one of the large language models which have unlocked strong capabilities on many tasks. The results show that there is still much room for improvement for ChatGPT on the extraction task.

information retrieval, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.03688

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Gloucestershire (0.04)
(9 more...)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing Keyphrase Extraction from Long Scientific Documents using Graph Embeddings

Martínez-Cruz, Roberto, Mahata, Debanjan, López-López, Alvaro J., Portela, José

arXiv.org Artificial IntelligenceMay-16-2023

In this study, we investigate using graph neural network (GNN) representations to enhance contextualized representations of pre-trained language models (PLMs) for keyphrase extraction from lengthy documents. We show that augmenting a PLM with graph embeddings provides a more comprehensive semantic understanding of words in a document, particularly for long documents. We construct a co-occurrence graph of the text and embed it using a graph convolutional network (GCN) trained on the task of edge prediction. We propose a graph-enhanced sequence tagging architecture that augments contextualized PLM embeddings with graph representations. Evaluating on benchmark datasets, we demonstrate that enhancing PLMs with graph embeddings outperforms state-of-the-art models on long documents, showing significant improvements in F1 scores across all the datasets. Our study highlights the potential of GNN representations as a complementary approach to improve PLM performance for keyphrase extraction from long documents.

information retrieval, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2305.09316

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)

Add feedback

Hybrid and Collaborative Passage Reranking

Zhang, Zongmeng, Zhou, Wengang, Shi, Jiaxin, Li, Houqiang

arXiv.org Artificial IntelligenceMay-16-2023

In passage retrieval system, the initial passage retrieval results may be unsatisfactory, which can be refined by a reranking scheme. Existing solutions to passage reranking focus on enriching the interaction between query and each passage separately, neglecting the context among the top-ranked passages in the initial retrieval list. To tackle this problem, we propose a Hybrid and Collaborative Passage Reranking (HybRank) method, which leverages the substantial similarity measurements of upstream retrievers for passage collaboration and incorporates the lexical and semantic properties of sparse and dense retrievers for reranking. Besides, built on off-the-shelf retriever features, HybRank is a plug-in reranker capable of enhancing arbitrary passage lists including previously reranked ones. Extensive experiments demonstrate the stable improvements of performance over prevalent retrieval and reranking methods, and verify the effectiveness of the core components of HybRank.

hybrank, proceedings, retriever, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.findings-acl.880

2305.09313

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America (0.04)
Oceania > Australia (0.04)
(10 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Soft Prompt Decoding for Multilingual Dense Retrieval

Huang, Zhiqi, Zeng, Hansi, Zamani, Hamed, Allan, James

arXiv.org Artificial IntelligenceMay-15-2023

In this work, we explore a Multilingual Information Retrieval (MLIR) task, where the collection includes documents in multiple languages. We demonstrate that applying state-of-the-art approaches developed for cross-lingual information retrieval to MLIR tasks leads to sub-optimal performance. This is due to the heterogeneous and imbalanced nature of multilingual collections -- some languages are better represented in the collection and some benefit from large-scale training data. To address this issue, we present KD-SPD, a novel soft prompt decoding approach for MLIR that implicitly "translates" the representation of documents in different languages into the same embedding space. To address the challenges of data scarcity and imbalance, we introduce a knowledge distillation strategy. The teacher model is trained on rich English retrieval data, and by leveraging bi-text data, our distillation framework transfers its retrieval knowledge to the multilingual document encoder. Therefore, our approach does not require any multilingual retrieval training data. Extensive experiments on three MLIR datasets with a total of 15 languages demonstrate that KD-SPD significantly outperforms competitive baselines in all cases. We conduct extensive analyses to show that our method has less language bias and better zero-shot transfer ability towards new languages.

information retrieval, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539618.3591769

2305.09025

Country:

North America > United States > California (0.14)
Asia > Taiwan > Taiwan Province > Taipei (0.05)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(8 more...)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Defending Against Misinformation Attacks in Open-Domain Question Answering

Weller, Orion, Khan, Aleem, Weir, Nathaniel, Lawrie, Dawn, Van Durme, Benjamin

arXiv.org Artificial IntelligenceMay-15-2023

Recent work in open-domain question answering (ODQA) has shown that adversarial poisoning of the search collection can cause large drops in accuracy for production systems. However, little to no work has proposed methods to defend against these attacks. To do so, we rely on the intuition that redundant information often exists in large corpora. To find it, we introduce a method that uses query augmentation to search for a diverse set of passages that could answer the original question but are less likely to have been poisoned. We integrate these new passages into the model through the design of a novel confidence method, comparing the predicted answer to its appearance in the retrieved contexts (what we call \textit{Confidence from Answer Redundancy}, i.e. CAR). Together these methods allow for a simple but effective way to defend against poisoning attacks that provides gains of nearly 20\% exact match across varying levels of data poisoning/knowledge conflicts.

information retrieval, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2212.10002

Country:

Asia > Middle East > Republic of Türkiye (0.05)
Africa > Kenya (0.05)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Government (0.94)
Health & Medicine (0.93)
Media > News (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.73)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)

Add feedback

Google's New A.I. Search Engine Should Leave the Media Very Worried

SlateMay-13-2023, 13:00:01 GMT

This article is from Big Technology, a newsletter by Alex Kantrowitz. At Google's I/O developer conference this week, the company showed an experimental version of its search engine handling an almost unimaginably difficult query. Asked whether a family with kids under three years old and a dog would prefer Arches National Park or Bryce Canyon, Google scoured the internet and returned a lengthy, detailed answer. It noted that while only Bryce had paths that allowed dogs, kids might love the rock formations at Arches, and that Arches still had plenty of dog-friendly campgrounds, pullouts, and roads. "Now, search does the heavy lifting for you," said Google Search VP Cathy Edwards.

google, publisher, search engine, (10 more...)

Slate

Country: North America > United States > Utah > Grand County (0.25)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

NevIR: Negation in Neural Information Retrieval

Weller, Orion, Lawrie, Dawn, Van Durme, Benjamin

arXiv.org Artificial IntelligenceMay-12-2023

Negation is a common everyday phenomena and has been a consistent area of weakness for language models (LMs). Although the Information Retrieval (IR) community has adopted LMs as the backbone of modern IR architectures, there has been little to no research in understanding how negation impacts neural IR. We therefore construct a straightforward benchmark on this theme: asking IR models to rank two documents that differ only by negation. We show that the results vary widely according to the type of IR architecture: cross-encoders perform best, followed by late-interaction models, and in last place are bi-encoder and sparse neural architectures. We find that most current information retrieval models do not consider negation, performing similarly or worse than randomly ranking. We show that although the obvious approach of continued fine-tuning on a dataset of contrastive documents containing negations increases performance (as does model size), there is still a large gap between machine and human performance.

artificial intelligence, natural language, neural information retrieval, (2 more...)

arXiv.org Artificial Intelligence

2305.07614

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

AMULET: Adaptive Matrix-Multiplication-Like Tasks

Kim, Junyoung, Ross, Kenneth, Sedlar, Eric, Stadler, Lukas

arXiv.org Artificial IntelligenceMay-12-2023

Many useful tasks in data science and machine learning applications can be written as simple variations of matrix multiplication. However, users have difficulty performing such tasks as existing matrix/vector libraries support only a limited class of computations hand-tuned for each unique hardware platform. Users can alternatively write the task as a simple nested loop but current compilers are not sophisticated enough to generate fast code for the task written in this way. To address these issues, we extend an open-source compiler to recognize and optimize these matrix multiplication-like tasks. Our framework, called Amulet, uses both database-style and compiler optimization techniques to generate fast code tailored to its execution environment. We show through experiments that Amulet achieves speedups on a variety of matrix multiplication-like tasks compared to existing compilers. For large matrices Amulet typically performs within 15% of hand-tuned matrix multiplication libraries, while handling a much broader class of computations.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2305.08872

Country:

Europe (1.00)
North America > United States > California (0.28)
North America > United States > New York > New York County > New York City (0.14)

Genre:

Workflow (0.67)
Research Report (0.50)

Technology:

Information Technology > Software (1.00)
Information Technology > Databases (1.00)
Information Technology > Data Science (1.00)
(3 more...)

Add feedback

Google shows the AI evolution of its search engine: What to know

Al JazeeraMay-11-2023, 10:35:10 GMT

Google has unveiled plans to infuse its dominant search engine with more advanced artificial intelligence technology. The move comes three months after Microsoft's Bing search engine started to tap into tech similar to that which powers the artificially intelligent chatbot ChatGPT. With our new generative AI experience in Search, you'll get even more from a single search. You'll be able to quickly make sense of information with an AI-powered snapshot, pointers to explore more and natural ways to ask. Starting at $1799, this ultra-premium device combines personal AI, #GoogleTensor G2, and @Android innovation for a #Pixel smartphone that unfolds into an incredible compact tablet.#GoogleIO

ai evolution, google, search engine, (3 more...)

Al Jazeera

AI-Alerts: 2023 > 2023-05 > AAAI AI-Alert for May 16, 2023 (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.90)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

THUIR@COLIEE 2023: More Parameters and Legal Knowledge for Legal Case Entailment

Li, Haitao, Wang, Changyue, Su, Weihang, Wu, Yueyue, Ai, Qingyao, Liu, Yiqun

arXiv.org Artificial IntelligenceMay-11-2023

This paper describes the approach of the THUIR team at the COLIEE 2023 Legal Case Entailment task. This task requires the participant to identify a specific paragraph from a given supporting case that entails the decision for the query case. We try traditional lexical matching methods and pre-trained language models with different sizes. Furthermore, learning-to-rank methods are employed to further improve performance. However, learning-to-rank is not very robust on this task. which suggests that answer passages cannot simply be determined with information retrieval techniques. Experimental results show that more parameters and legal knowledge contribute to the legal case entailment task. Finally, we get the third place in COLIEE 2023. The implementation of our method can be found at https://github.com/CSHaitao/THUIR-COLIEE2023.

information retrieval, legal case entailment task, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2305.06817

Country:

North America > Canada (0.15)
Europe > Portugal > Braga > Braga (0.05)
Asia > China > Beijing > Beijing (0.05)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback