AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop

Laskar, Md Tahmid Rahman, Chen, Cheng, Fu, Xue-Yong, TN, Shashi Bhushan

arXiv.org Artificial IntelligenceNov-2-2022

Telephone transcription data can be very noisy due to speech recognition errors, disfluencies, etc. Not only that annotating such data is very challenging for the annotators, but also such data may have lots of annotation errors even after the annotation job is completed, resulting in a very poor model performance. In this paper, we present an active learning framework that leverages human in the loop learning to identify data samples from the annotated dataset for re-annotation that are more likely to contain annotation errors. In this way, we largely reduce the need for data re-annotation for the whole dataset. We conduct extensive experiments with our proposed approach for Named Entity Recognition and observe that by re-annotating only about 6% training instances out of the whole dataset, the F1 score for a certain entity type can be significantly improved by about 25%.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2211.01354

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.15)
North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Recognizing Nested Entities from Flat Supervision: A New NER Subtask, Feasibility and Challenges

Zhu, Enwei, Liu, Yiyang, Jin, Ming, Li, Jinpeng

arXiv.org Artificial IntelligenceNov-1-2022

Many recent named entity recognition (NER) studies criticize flat NER for its non-overlapping assumption, and switch to investigating nested NER. However, existing nested NER models heavily rely on training data annotated with nested entities, while labeling such data is costly. This study proposes a new subtask, nested-from-flat NER, which corresponds to a realistic application scenario: given data annotated with flat entities only, one may still desire the trained model capable of recognizing nested entities. To address this task, we train span-based models and deliberately ignore the spans nested inside labeled entities, since these spans are possibly unlabeled entities. With nested entities removed from the training data, our model achieves 54.8%, 54.2% and 41.1% F1 scores on the subset of spans within entities on ACE 2004, ACE 2005 and GENIA, respectively. This suggests the effectiveness of our approach and the feasibility of the task. In addition, the model's performance on flat entities is entirely unaffected. We further manually annotate the nested entities in the test set of CoNLL 2003, creating a nested-from-flat NER benchmark. Analysis results show that the main challenges stem from the data and annotation inconsistencies between the flat and nested entities.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2211.00301

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.05)
Asia > Singapore (0.04)
(19 more...)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.91)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)

Add feedback

Cross-Lingual and Cross-Domain Crisis Classification for Low-Resource Scenarios

Sánchez, Cinthia, Sarmiento, Hernan, Abeliuk, Andres, Pérez, Jorge, Poblete, Barbara

arXiv.org Artificial IntelligenceOct-31-2022

Social media data has emerged as a useful source of timely information about real-world crisis events. One of the main tasks related to the use of social media for disaster management is the automatic identification of crisis-related messages. Most of the studies on this topic have focused on the analysis of data for a particular type of event in a specific language. This limits the possibility of generalizing existing approaches because models cannot be directly applied to new types of events or other languages. In this work, we study the task of automatically classifying messages that are related to crisis events by leveraging cross-language and cross-domain labeled data. Our goal is to make use of labeled data from high-resource languages to classify messages from other (low-resource) languages and/or of new (previously unseen) types of crisis situations. For our study we consolidated from the literature a large unified dataset containing multiple crisis events and languages. Our empirical findings show that it is indeed possible to leverage data from crisis events in English to classify the same type of event in other languages, such as Spanish and Italian (80.0% F1-score). Furthermore, we achieve good performance for the cross-domain task (80.0% F1-score) in a cross-lingual setting. Overall, our work contributes to improving the data scarcity problem that is so important for multilingual crisis classification. In particular, mitigating cold-start situations in emergency events, when time is of essence.

information retrieval, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2209.02139

Country:

North America > United States > Texas (0.14)
South America > Ecuador (0.05)
Europe > Italy > Abruzzo > L'Aquila Province > L'Aquila (0.04)
(14 more...)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Add feedback

Learning to Navigate Wikipedia by Taking Random Walks

Zaheer, Manzil, Marino, Kenneth, Grathwohl, Will, Schultz, John, Shang, Wendy, Babayan, Sheila, Ahuja, Arun, Dasgupta, Ishita, Kaeser-Chen, Christine, Fergus, Rob

arXiv.org Artificial IntelligenceOct-31-2022

A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information. Internet search engines reliably find the correct vicinity but the top results may be a few links away from the desired target. A complementary approach is navigation via hyperlinks, employing a policy that comprehends local content and selects a link that moves it closer to the target. In this paper, we show that behavioral cloning of randomly sampled trajectories is sufficient to learn an effective link selection policy. We demonstrate the approach on a graph version of Wikipedia with 38M nodes and 387M edges. The model is able to efficiently navigate between nodes 5 and 20 steps apart 96% and 92% of the time, respectively. We then use the resulting embeddings and policy in downstream fact verification and question answering tasks where, in combination with basic TF-IDF search and ranking methods, they are competitive results to the state-of-the-art methods.

information retrieval, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

2211.00177

Country:

North America > United States > Rhode Island > Providence County > Providence (0.05)
Europe > Greece (0.05)
Europe > Belgium > Wallonia (0.04)
(4 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment (0.93)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications > Social Media (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)

Add feedback

Adversarial Retriever-Ranker for dense text retrieval

Zhang, Hang, Gong, Yeyun, Shen, Yelong, Lv, Jiancheng, Duan, Nan, Chen, Weizhu

arXiv.org Artificial IntelligenceOct-30-2022

Current dense text retrieval models face two typical challenges. First, they adopt a siamese dual-encoder architecture to encode queries and documents independently for fast indexing and searching, while neglecting the finer-grained term-wise interactions. This results in a sub-optimal recall performance. Second, their model training highly relies on a negative sampling technique to build up the negative documents in their contrastive losses. To address these challenges, we present Adversarial Retriever-Ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoder ranker. The two models are jointly optimized according to a minimax adversarial objective: the retriever learns to retrieve negative documents to cheat the ranker, while the ranker learns to rank a collection of candidates including both the ground-truth and the retrieved ones, as well as providing progressive direct feedback to the dual-encoder retriever. Through this adversarial game, the retriever gradually produces harder negative documents to train a better ranker, whereas the cross-encoder ranker provides progressive feedback to improve retriever. We evaluate AR2 on three benchmarks. Experimental results show that AR2 consistently and significantly outperforms existing dense retriever methods and achieves new state-of-the-art results on all of them. This includes the improvements on Natural Questions R@5 to 77.9%(+2.1%), TriviaQA R@5 to 78.2%(+1.4), and MS-MARCO MRR@10 to 39.5%(+1.3%). Code and models are available at https://github.com/microsoft/AR2.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2110.03611

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)
Information Technology > Information Management > Search (0.93)

Add feedback

SureDone Launches SureFit Year, Make and Model Search Engine for BigCommerce and Shopify

#artificialintelligenceOct-28-2022, 06:35:29 GMT

SureDone has launched SureFit, a Year, Make and Model search engine built for automotive, motorsports and powersports parts and accessory sellers using BigCommerce, Shopify or SureDone's integrated storefront and shopping cart. SureFit was designed with the input of brands, enterprise companies and high growth sellers to support fitment searches on their e-commerce websites. Visitors to parts and accessory websites want to find the specific parts that fit their vehicle. Leveraging the SureFit Year, Make and Model Search on a website results in visitors finding the parts they need and being confident they will fit. In addition, visitors will see additional available parts for their vehicle resulting in increased time on site and increasing multiple part purchases with a lower cart abandonment rate.

make and model search engine, suredone, suredone launch surefit year, (9 more...)

#artificialintelligence

Genre: Financial News (0.33)

Industry: Information Technology > Services > e-Commerce Services (0.59)

Technology:

Information Technology > e-Commerce (1.00)
Information Technology > Information Management > Search (0.89)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.62)

Add feedback

CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement Learning

Wu, Zeqiu, Luan, Yi, Rashkin, Hannah, Reitter, David, Hajishirzi, Hannaneh, Ostendorf, Mari, Tomar, Gaurav Singh

arXiv.org Artificial IntelligenceOct-28-2022

Compared to standard retrieval tasks, passage retrieval for conversational question answering (CQA) poses new challenges in understanding the current user question, as each question needs to be interpreted within the dialogue context. Moreover, it can be expensive to re-train well-established retrievers such as search engines that are originally developed for non-conversational queries. To facilitate their use, we develop a query rewriting model CONQRR that rewrites a conversational question in the context into a standalone question. It is trained with a novel reward function to directly optimize towards retrieval using reinforcement learning and can be adapted to any off-the-shelf retriever. CONQRR achieves state-of-the-art results on a recent open-domain CQA dataset containing conversations from three different sources, and is effective for two different off-the-shelf retrievers. Our extensive analysis also shows the robustness of CONQRR to out-of-domain dialogues as well as to zero query rewriting supervision.

information retrieval, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2112.08558

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre: Research Report (0.82)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)
(2 more...)

Add feedback

YouTube trim video tool turns videos into 6-second bumper ads – Search Engine Land

#artificialintelligenceOct-27-2022, 04:20:46 GMT

The new tool is now available globally in Google Ads, making it easy for advertisers to create bumper ads.

search engine land, trim video tool turn video

#artificialintelligence

Industry: Media > News (0.69)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Effective and Efficient Query-aware Snippet Extraction for Web Search

Yi, Jingwei, Wu, Fangzhao, Wu, Chuhan, Huang, Xiaolong, Jiao, Binxing, Sun, Guangzhong, Xie, Xing

arXiv.org Artificial IntelligenceOct-27-2022

Query-aware webpage snippet extraction is widely used in search engines to help users better understand the content of the returned webpages before clicking. Although important, it is very rarely studied. In this paper, we propose an effective query-aware webpage snippet extraction method named DeepQSE, aiming to select a few sentences which can best summarize the webpage content in the context of input query. DeepQSE first learns query-aware sentence representations for each sentence to capture the fine-grained relevance between query and sentence, and then learns document-aware query-sentence relevance representations for snippet extraction. Since the query and each sentence are jointly modeled in DeepQSE, its online inference may be slow. Thus, we further propose an efficient version of DeepQSE, named Efficient-DeepQSE, which can significantly improve the inference speed of DeepQSE without affecting its performance. The core idea of Efficient-DeepQSE is to decompose the query-aware snippet extraction task into two stages, i.e., a coarse-grained candidate sentence selection stage where sentence representations can be cached, and a fine-grained relevance modeling stage. Experiments on two real-world datasets validate the effectiveness and efficiency of our methods.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.08809

Country:

Asia > China (0.05)
Pacific Ocean > North Pacific Ocean > East China Sea > Yellow Sea (0.04)
Europe > Germany (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.90)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)

Add feedback

FlexER: Flexible Entity Resolution for Multiple Intents

Genossar, Bar, Shraga, Roee, Gal, Avigdor

arXiv.org Artificial IntelligenceOct-26-2022

Entity resolution, a longstanding problem of data cleaning and integration, aims at identifying data records that represent the same real-world entity. Existing approaches treat entity resolution as a universal task, assuming the existence of a single interpretation of a real-world entity and focusing only on finding matched records, separating corresponding from non-corresponding ones, with respect to this single interpretation. However, in real-world scenarios, where entity resolution is part of a more general data project, downstream applications may have varying interpretations of real-world entities relating, for example, to various user needs. In what follows, we introduce the problem of multiple intents entity resolution (MIER), an extension to the universal (single intent) entity resolution task. As a solution, we propose FlexER, utilizing contemporary solutions to universal entity resolution tasks to solve multiple intents entity resolution. FlexER addresses the problem as a multi-label classification problem. It combines intent-based representations of tuple pairs using a multiplex graph representation that serves as an input to a graph neural network (GNN). FlexER learns intent representations and improves the outcome to multiple resolution problems. A large-scale empirical evaluation introduces a new benchmark and, using also two well-known benchmarks, shows that FlexER effectively solves the MIER problem and outperforms the state-of-the-art for a universal entity resolution.

information retrieval, machine learning, resolution, (20 more...)

arXiv.org Artificial Intelligence

2209.07569

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback