AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

PREME: Preference-based Meeting Exploration through an Interactive Questionnaire

Arabzadeh, Negar, Ahmadvand, Ali, Kiseleva, Julia, Liu, Yang, Awadallah, Ahmed Hassan, Zhong, Ming, Shokouhi, Milad

arXiv.org Artificial IntelligenceApr-26-2023

The recent increase in the volume of online meetings necessitates automated tools for managing and organizing the material, especially when an attendee has missed the discussion and needs assistance in quickly exploring it. In this work, we propose a novel end-to-end framework for generating interactive questionnaires for preference-based meeting exploration. As a result, users are supplied with a list of suggested questions reflecting their preferences. Since the task is new, we introduce an automatic evaluation strategy. Namely, it measures how much the generated questions via questionnaire are answerable to ensure factual correctness and covers the source meeting for the depth of possible exploration.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2205.0237

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Maryland > Baltimore (0.04)
(8 more...)

Genre:

Questionnaire & Opinion Survey (0.97)
Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Microsoft shares up 8.3% as AI features give a boost to sales

The GuardianApr-25-2023, 23:55:32 GMT

Microsoft Corp beat Wall Street's quarterly revenue and profit estimates on Tuesday, driven by growth in its cloud computing and Office productivity software businesses, and the company said artificial intelligence products were stimulating sales. The company forecast that revenue in its main segments for the current quarter would match or top Wall Street targets. Shares gained 8.3% in after-market trading following a report by the Redmond, Washington-based technology company that profits were $2.45 a share in the fiscal third quarter, beating Wall Street estimates of $2.23, according to data from Refinitiv and up 10% from the same quarter last year. In regular trading, fears about earnings had sent Microsoft down 2.2%, making it the biggest drag on the S&P 500 on Tuesday ahead of its report. Revenue rose 7% to $52.9bn in the quarter ended March, inching past the average analyst estimate of $51.02bn, according to Refinitiv.

ai feature, microsoft, revenue, (9 more...)

The Guardian

Country:

North America > United States > New York > New York County > New York City (0.73)
North America > United States > Washington > King County > Redmond (0.26)
Asia > Vietnam > Long An Province (0.06)

Genre: Financial News (0.57)

Industry:

Information Technology (1.00)
Banking & Finance > Trading (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.44)

Add feedback

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

Llordes, Michael, Ganguly, Debasis, Bhatia, Sumit, Agarwal, Chirag

arXiv.org Artificial IntelligenceApr-25-2023

Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations. These models, however, suffer from poor interpretability as they do not rely on explicit term matching. As a form of local per-query explanations, we introduce the notion of equivalent queries that are generated by maximizing the similarity between the NRM's results and the result set of a sparse retrieval system with the equivalent query. We then compare this approach with existing methods such as RM3-based query expansion and contrast differences in retrieval effectiveness and in the terms generated by each approach.

artificial intelligence, information retrieval query processing, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.12631

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Taiwan > Taiwan Province > Taipei (0.05)
North America > United States > New York > New York County > New York City (0.05)
(8 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.78)

Add feedback

Hitachi at SemEval-2023 Task 3: Exploring Cross-lingual Multi-task Strategies for Genre and Framing Detection in Online News

Koreeda, Yuta, Yokote, Ken-ichi, Ozaki, Hiroaki, Yamaguchi, Atsuki, Tsunokake, Masaya, Sogawa, Yasuhiro

arXiv.org Artificial IntelligenceApr-25-2023

This paper explains the participation of team Hitachi to SemEval-2023 Task 3 "Detecting the genre, the framing, and the persuasion techniques in online news in a multi-lingual setup.'' Based on the multilingual, multi-task nature of the task and the low-resource setting, we investigated different cross-lingual and multi-task strategies for training the pretrained language models. Through extensive experiments, we found that (a) cross-lingual/multi-task training, and (b) collecting an external balanced dataset, can benefit the genre and framing detection. We constructed ensemble models from the results and achieved the highest macro-averaged F1 scores in Italian and Russian genre categorization subtasks.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.01794

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Latvia > Riga Municipality > Riga (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.70)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Add feedback

Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities

Van Gysel, Christophe

arXiv.org Artificial IntelligenceApr-25-2023

Virtual assistants are becoming increasingly important speech-driven Information Retrieval platforms that assist users with various tasks. We discuss open problems and challenges with respect to modeling spoken information queries for virtual assistants, and list opportunities where Information Retrieval methods and research can be applied to improve the quality of virtual assistant speech recognition. We discuss how query domain classification, knowledge graphs and user interaction data, and query personalization can be helpful to improve the accurate recognition of spoken information domain queries. Finally, we also provide a brief overview of current problems and challenges in speech recognition.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539618.3591849

2304.13149

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.06)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre:

Overview (0.68)
Research Report (0.50)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

GARCIA: Powering Representations of Long-tail Query with Multi-granularity Contrastive Learning

Wang, Weifan, Hu, Binbin, Peng, Zhicheng, Zhong, Mingjie, Zhang, Zhiqiang, Liu, Zhongyi, Zhang, Guannan, Zhou, Jun

arXiv.org Artificial IntelligenceApr-24-2023

Recently, the growth of service platforms brings great convenience to both users and merchants, where the service search engine plays a vital role in improving the user experience by quickly obtaining desirable results via textual queries. Unfortunately, users' uncontrollable search customs usually bring vast amounts of long-tail queries, which severely threaten the capability of search models. Inspired by recently emerging graph neural networks (GNNs) and contrastive learning (CL), several efforts have been made in alleviating the long-tail issue and achieve considerable performance. Nevertheless, they still face a few major weaknesses. Most importantly, they do not explicitly utilize the contextual structure between heads and tails for effective knowledge transfer, and intention-level information is commonly ignored for more generalized representations. To this end, we develop a novel framework GARCIA, which exploits the graph based knowledge transfer and intention based representation generalization in a contrastive setting. In particular, we employ an adaptive encoder to produce informative representations for queries and services, as well as hierarchical structure aware representations of intentions. To fully understand tail queries and services, we equip GARCIA with a novel multi-granularity contrastive learning module, which powers representations through knowledge transfer, structure enhancement and intention generalization. Subsequently, the complete GARCIA is well trained in a pre-training&fine-tuning manner. At last, we conduct extensive experiments on both offline and online environments, which demonstrates the superior capability of GARCIA in improving tail queries and overall performance in service search scenarios.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.12537

Country: Asia > China (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)

Add feedback

Constructing Tree-based Index for Efficient and Effective Dense Retrieval

Li, Haitao, Ai, Qingyao, Zhan, Jingtao, Mao, Jiaxin, Liu, Yiqun, Liu, Zheng, Cao, Zhao

arXiv.org Artificial IntelligenceApr-24-2023

Recent studies have shown that Dense Retrieval (DR) techniques can significantly improve the performance of first-stage retrieval in IR systems. Despite its empirical effectiveness, the application of DR is still limited. In contrast to statistic retrieval models that rely on highly efficient inverted index solutions, DR models build dense embeddings that are difficult to be pre-processed with most existing search indexing systems. To avoid the expensive cost of brute-force search, the Approximate Nearest Neighbor (ANN) algorithm and corresponding indexes are widely applied to speed up the inference process of DR models. Unfortunately, while ANN can improve the efficiency of DR models, it usually comes with a significant price on retrieval performance. To solve this issue, we propose JTR, which stands for Joint optimization of TRee-based index and query encoding. Specifically, we design a new unified contrastive learning loss to train tree-based index and query encoder in an end-to-end manner. The tree-based negative sampling strategy is applied to make the tree have the maximum heap property, which supports the effectiveness of beam search well. Moreover, we treat the cluster assignment as an optimization problem to update the tree-based index that allows overlapped clustering. We evaluate JTR on numerous popular retrieval benchmarks. Experimental results show that JTR achieves better retrieval performance while retaining high system efficiency compared with widely-adopted baselines. It provides a potential solution to balance efficiency and effectiveness in neural retrieval system designs.

information retrieval, machine learning, node, (16 more...)

arXiv.org Artificial Intelligence

2304.11943

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)

Add feedback

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

Luo, Chuwei, Cheng, Changxu, Zheng, Qi, Yao, Cong

arXiv.org Artificial IntelligenceApr-21-2023

Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (\eg, the F1 score of RE on FUNSD is boosted from 80.35\% to 89.45\%). The code and models are publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/GeoLayoutLM

data mining, machine learning, pattern recognition, (20 more...)

arXiv.org Artificial Intelligence

2304.10759

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.72)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)
Information Technology > Data Science > Data Mining > Text Mining (0.62)
(3 more...)

Add feedback

Artificial intelligence will replace Google's historic search engine - Plugavel

#artificialintelligenceApr-18-2023, 13:16:04 GMT

Since the arrival of ChatGPTChatGPTand especially the integration of GPT, theartificial intelligenceartificial intelligence of OpenAI in Bing, Google's management is on alert. AI dev teams were forced to rush out the Bard chatbot, with major issues as a result. Today, while the New York Times announces that SamsungSamsung may well replace Google with Bing as the default search engine on its smartphonessmartphones, the tension goes up a notch at the Internet giant. As a result, it seems that AIs will take an important place at the next conference. Still according to information from New York Timesmore than 160 Google employees are currently in the process of floorfloor on an AI directly integrated into the search engine.

google, intelligence, search engine, (9 more...)

#artificialintelligence

Country:

North America > United States > New York (0.26)
Europe (0.06)

Industry: Information Technology > Services (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.37)

Add feedback

Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search

Wang, Wenping, Guo, Yunxi, Shen, Chiyao, Ding, Shuai, Liao, Guangdeng, Fu, Hao, Prabhakar, Pramodh Karanth

arXiv.org Artificial IntelligenceApr-18-2023

Embedding based retrieval has seen its usage in a variety of search applications like e-commerce, social networking search etc. While the approach has demonstrated its efficacy in tasks like semantic matching and contextual search, it is plagued by the problem of uncontrollable relevance. In this paper, we conduct an analysis of embedding-based retrieval launched in early 2021 on our social network search engine, and define two main categories of failures introduced by it, integrity and junkiness. The former refers to issues such as hate speech and offensive content that can severely harm user experience, while the latter includes irrelevant results like fuzzy text matching or language mismatches. Efficient methods during model inference are further proposed to resolve the issue, including indexing treatments and targeted user cohort treatments, etc. Though being simple, we show the methods have good offline NDCG and online A/B tests metrics gain in practice. We analyze the reasons for the improvements, pointing out that our methods are only preliminary attempts to this important but challenging problem. We put forward potential future directions to explore.

artificial intelligence, information retrieval, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539618.3591831

2304.09287

Country:

North America > United States > California > San Mateo County > Menlo Park (0.06)
Asia > Taiwan > Taiwan Province > Taipei (0.06)
North America > United States > New York > New York County > New York City (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.51)

Add feedback