AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

HierCat: Hierarchical Query Categorization from Weakly Supervised Data at Facebook Marketplace

He, Yunzhong, Zhang, Cong, Kong, Ruoyan, Kulkarni, Chaitanya, Liu, Qing, Gandhe, Ashish, Nithianandan, Amit, Prakash, Arul

arXiv.org Artificial IntelligenceFeb-21-2023

Query categorization at customer-to-customer e-commerce platforms like Facebook Marketplace is challenging due to the vagueness of search intent, noise in real-world data, and imbalanced training data across languages. Its deployment also needs to consider challenges in scalability and downstream integration in order to translate modeling advances into better search result relevance. In this paper we present HierCat, the query categorization system at Facebook Marketplace. HierCat addresses these challenges by leveraging multi-task pre-training of dual-encoder architectures with a hierarchical inference step to effectively learn from weakly supervised training data mined from searcher engagement. We show that HierCat not only outperforms popular methods in offline experiments, but also leads to 1.4% improvement in NDCG and 4.3% increase in searcher engagement at Facebook Marketplace Search in online A/B testing.

category, information retrieval, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3543873.3584622

2302.10527

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
North America > United States > Texas > Travis County > Austin (0.05)
North America > United States > New York > New York County > New York City (0.05)
(6 more...)

Genre: Research Report (0.65)

Industry: Information Technology > Services > e-Commerce Services (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Que2Engage: Embedding-based Retrieval for Relevant and Engaging Products at Facebook Marketplace

He, Yunzhong, Tian, Yuxin, Wang, Mengjiao, Chen, Feier, Yu, Licheng, Tang, Maolong, Chen, Congcong, Zhang, Ning, Kuang, Bin, Prakash, Arul

arXiv.org Artificial IntelligenceFeb-21-2023

Embedding-based Retrieval (EBR) in e-commerce search is a powerful search retrieval technique to address semantic matches between search queries and products. However, commercial search engines like Facebook Marketplace Search are complex multi-stage systems optimized for multiple business objectives. At Facebook Marketplace, search retrieval focuses on matching search queries with relevant products, while search ranking puts more emphasis on contextual signals to up-rank the more engaging products. As a result, the end-to-end searcher experience is a function of both relevance and engagement, and the interaction between different stages of the system. This presents challenges to EBR systems in order to optimize for better searcher experiences. In this paper we presents Que2Engage, a search EBR system built towards bridging the gap between retrieval and ranking for end-to-end optimizations. Que2Engage takes a multimodal & multitask approach to infuse contextual information into the retrieval stage and to balance different business objectives. We show the effectiveness of our approach via a multitask evaluation framework and thorough baseline comparisons and ablation studies. Que2Engage is deployed on Facebook Marketplace Search and shows significant improvements in searcher engagement in two weeks of A/B testing.

artificial intelligence, information retrieval, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3543873.3584633

2302.11052

Country:

North America > United States > New York > New York County > New York City (0.06)
North America > United States > Texas > Travis County > Austin (0.05)
North America > United States > District of Columbia > Washington (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Services > e-Commerce Services (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)

Add feedback

Neeva's AI-powered search engine showcases its sources

PCWorldFeb-19-2023, 14:00:00 GMT

If you're concerned about publishers and content creators getting their due in the new age of AI-powered chat, you might want to adopt Neeva, a small search engine that emphasizes pushing you to its sources as much as giving you the answer. Neeva, founded by Sridhar Ramaswamy (ex-senior vice president of ads at Google), and Vivek Raghunathan, (ex-vice president of monetization at YouTube), is one of the small number of search engines that are either built around AI-powered technology or have added it. While the big three include ChatGPT, Microsoft Bing, and eventually Google Bard, both You.com as well as Neeva have a chance to try and grab some attention as AI-powered search begins to grow. However, Neeva can't really be considered an AI-powered chatbot, at least not yet. Think of it as AI-powered search, or search plus a bit of AI layered on top.

ai-powered search engine showcase, neeva, neevaai, (2 more...)

PCWorld

Industry: Information Technology > Services (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)

Add feedback

Intent Identification and Entity Extraction for Healthcare Queries in Indic Languages

Mullick, Ankan, Mondal, Ishani, Ray, Sourjyadip, Raghav, R, Chaitanya, G Sai, Goyal, Pawan

arXiv.org Artificial IntelligenceFeb-19-2023

Scarcity of data and technological limitations for resource-poor languages in developing countries like India poses a threat to the development of sophisticated NLU systems for healthcare. To assess the current status of various state-of-the-art language models in healthcare, this paper studies the problem by initially proposing two different Healthcare datasets, Indian Healthcare Query Intent-WebMD and 1mg (IHQID-WebMD and IHQID-1mg) and one real world Indian hospital query data in English and multiple Indic languages (Hindi, Bengali, Tamil, Telugu, Marathi and Gujarati) which are annotated with the query intents as well as entities. Our aim is to detect query intents and extract corresponding entities. We perform extensive experiments on a set of models in various realistic settings and explore two scenarios based on the access to English data only (less costly) and access to target language data (more expensive). We analyze context specific practical relevancy through empirical analysis. The results, expressed in terms of overall F1 score show that our approach is practically useful to identify intents and entities.

artificial intelligence, information retrieval, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.09685

Country:

Asia > Indonesia > Bali (0.04)
Europe > Italy > Tuscany > Florence (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.51)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.46)

Add feedback

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Li, Shiyang, Yavuz, Semih, Chen, Wenhu, Yan, Xifeng

arXiv.org Artificial IntelligenceFeb-19-2023

Task-adaptive pre-training (TAPT) and Self-training (ST) have emerged as the major semi-supervised approaches to improve natural language understanding (NLU) tasks with massive amount of unlabeled data. However, it's unclear whether they learn similar representations or they can be effectively combined. In this paper, we show that TAPT and ST can be complementary with simple TFS protocol by following TAPT -> Finetuning -> Self-training (TFS) process. Experimental results show that TFS protocol can effectively utilize unlabeled data to achieve strong combined gains consistently across six datasets covering sentiment classification, paraphrase identification, natural language inference, named entity recognition and dialogue slot classification. We investigate various semi-supervised settings and consistently show that gains from TAPT and ST can be strongly additive by following TFS procedure. We hope that TFS could serve as an important semi-supervised baseline for future NLP studies.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2109.06466

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Texas (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.66)
Information Technology > Artificial Intelligence > Natural Language > Understanding (0.61)
(3 more...)

Add feedback

Lero: A Learning-to-Rank Query Optimizer

Zhu, Rong, Chen, Wei, Ding, Bolin, Chen, Xingguang, Pfadler, Andreas, Wu, Ziniu, Zhou, Jingren

arXiv.org Artificial IntelligenceFeb-19-2023

A recent line of works apply machine learning techniques to assist or rebuild cost-based query optimizers in DBMS. While exhibiting superiority in some benchmarks, their deficiencies, e.g., unstable performance, high training cost, and slow model updating, stem from the inherent hardness of predicting the cost or latency of execution plans using machine learning models. In this paper, we introduce a learning-to-rank query optimizer, called Lero, which builds on top of a native query optimizer and continuously learns to improve the optimization performance. The key observation is that the relative order or rank of plans, rather than the exact cost or latency, is sufficient for query optimization. Lero employs a pairwise approach to train a classifier to compare any two plans and tell which one is better. Such a binary classification task is much easier than the regression task to predict the cost or latency, in terms of model efficiency and accuracy. Rather than building a learned optimizer from scratch, Lero is designed to leverage decades of wisdom of databases and improve the native query optimizer. With its non-intrusive design, Lero can be implemented on top of any existing DBMS with minimal integration efforts. We implement Lero and demonstrate its outstanding performance using PostgreSQL. In our experiments, Lero achieves near optimal performance on several benchmarks. It reduces the plan execution time of the native optimizer in PostgreSQL by up to 70% and other learned query optimizers by up to 37%. Meanwhile, Lero continuously learns and automatically adapts to query workloads and changes in data.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.06873

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

BERT is not The Count: Learning to Match Mathematical Statements with Proofs

Li, Weixian Waylon, Ziser, Yftah, Coavoux, Maximin, Cohen, Shay B.

arXiv.org Artificial IntelligenceFeb-18-2023

We introduce a task consisting in matching a proof to a given mathematical statement. The task fits well within current research on Mathematical Information Retrieval and, more generally, mathematical article analysis (Mathematical Sciences, 2014). We present a dataset for the task (the MATcH dataset) consisting of over 180k statement-proof pairs extracted from modern mathematical research articles. We find this dataset highly representative of our task, as it consists of relatively new findings useful to mathematicians. We propose a bilinear similarity model and two decoding methods to match statements to proofs effectively. While the first decoding method matches a proof to a statement without being aware of other statements or proofs, the second method treats the task as a global matching problem. Through a symbol replacement procedure, we analyze the "insights" that pre-trained language models have in such mathematical article analysis and show that while these models perform well on this task with the best performing mean reciprocal rank of 73.7, they follow a relatively shallow symbolic analysis and matching to achieve that performance.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.0935

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(7 more...)

Genre: Research Report > New Finding (0.88)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos

Sun, Xin, Wang, Xuan, Gao, Jialin, Liu, Qiong, Zhou, Xi

arXiv.org Artificial IntelligenceFeb-18-2023

Moment retrieval in videos is a challenging task that aims to retrieve the most relevant video moment in an untrimmed video given a sentence description. Previous methods tend to perform self-modal learning and cross-modal interaction in a coarse manner, which neglect fine-grained clues contained in video content, query context, and their alignment. To this end, we propose a novel Multi-Granularity Perception Network (MGPN) that perceives intra-modality and inter-modality information at a multi-granularity level. Specifically, we formulate moment retrieval as a multi-choice reading comprehension task and integrate human reading strategies into our framework. A coarse-grained feature encoder and a co-attention mechanism are utilized to obtain a preliminary perception of intra-modality and inter-modality information. Then a fine-grained feature encoder and a conditioned interaction module are introduced to enhance the initial perception inspired by how humans address reading comprehension problems. Moreover, to alleviate the huge computation burden of some existing methods, we further design an efficient choice comparison module and reduce the hidden size with imperceptible quality loss. Extensive experiments on Charades-STA, TACoS, and ActivityNet Captions datasets demonstrate that our solution outperforms existing state-of-the-art methods. Codes are available at github.com/Huntersxsx/MGPN.

information, information retrieval, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3477495.3532083

2205.12886

Country:

Asia > China > Shanghai > Shanghai (0.05)
Europe > Spain > Galicia > Madrid (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry: Education (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)

Add feedback

Petulant AI-powered chatbots risk the integrity of search engines

#artificialintelligenceFeb-17-2023, 18:56:09 GMT

Search engines are some of the most popular sites on the internet, acting as a gateway to information for users. Whether you're looking to remember the URL for your online banking or the answer to a particularly tricky pub quiz question, the first port of call for many is Google. Traditionally, search engines haven't necessarily directly answered a question, instead filtering results it finds on the internet and presenting them in a list of links to websites it believes may be useful to your query. In recent years, Google has rolled out features including'featured snippets' and'knowledge panels' that try to give you the information directly, rather than going elsewhere. Artificial intelligence, or AI, is already used in search engines to rank relevant links or pull out key phrases of text to populate knowledge panels.

engine, information, search engine, (13 more...)

#artificialintelligence

Country: Europe > Switzerland > Zürich > Zürich (0.05)

Industry: Banking & Finance (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Unsupervised Keyphrase Extraction via Interpretable Neural Networks

Joshi, Rishabh, Balachandran, Vidhisha, Saldanha, Emily, Glenski, Maria, Volkova, Svitlana, Tsvetkov, Yulia

arXiv.org Artificial IntelligenceFeb-17-2023

Keyphrase extraction aims at automatically extracting a list of "important" phrases representing the key concepts in a document. Prior approaches for unsupervised keyphrase extraction resorted to heuristic notions of phrase importance via embedding clustering or graph centrality, requiring extensive domain expertise. Our work presents a simple alternative approach which defines keyphrases as document phrases that are salient for predicting the topic of the document. To this end, we propose INSPECT -- an approach that uses self-explaining models for identifying influential keyphrases in a document by measuring the predictive impact of input phrases on the downstream task of the document topic classification. We show that this novel method not only alleviates the need for ad-hoc heuristics but also achieves state-of-the-art results in unsupervised keyphrase extraction in four datasets across two domains: scientific publications and news articles.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2203.0764

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(6 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)
(2 more...)

Add feedback