Information Retrieval
Perplexity will put ads in its AI search engine and share revenue with publishers
When people type a question into Perplexity, the two-year-old search engine scours the internet and uses information from multiple sources, including online publishers, to synthesize an answer using AI. Soon, Perplexity will start sharing revenue with some publishers as part of an advertising platform it plans to launch around the end of September, the company announced on Tuesday. The initiative, known as the Perplexity Publishers' Program, comes less than two months after the San Francisco-based startup backed by investors like Jeff Bezos and NVIDIA, and valued at 3 billion, came under fire from Forbes, Wired, and Condรฉ Nast for allegedly scraping content without permission and ignoring robots.txt, Perplexity's initial partners include TIME, Fortune, The Texas Tribune, Der Spiegel and Automattic, the company behind Wordpress.com. It's not clear exactly how much revenue Perplexity will share with publishers.
Event-Arguments Extraction Corpus and Modeling using BERT for Arabic
Aljabari, Alaa, Duaibes, Lina, Jarrar, Mustafa, Khalilia, Mohammed
Event-argument extraction is a challenging task, particularly in Arabic due to sparse linguistic resources. To fill this gap, we introduce the \hadath corpus ($550$k tokens) as an extension of Wojood, enriched with event-argument annotations. We used three types of event arguments: $agent$, $location$, and $date$, which we annotated as relation types. Our inter-annotator agreement evaluation resulted in $82.23\%$ $Kappa$ score and $87.2\%$ $F_1$-score. Additionally, we propose a novel method for event relation extraction using BERT, in which we treat the task as text entailment. This method achieves an $F_1$-score of $94.01\%$. To further evaluate the generalization of our proposed method, we collected and annotated another out-of-domain corpus (about $80$k tokens) called \testNLI and used it as a second test set, on which our approach achieved promising results ($83.59\%$ $F_1$-score). Last but not least, we propose an end-to-end system for event-arguments extraction. This system is implemented as part of SinaTools, and both corpora are publicly available at {\small \url{https://sina.birzeit.edu/wojood}}
Label-Guided Prompt for Multi-label Few-shot Aspect Category Detection
Guan, ChaoFeng, Zhu, YaoHui, Bai, Yu, Wang, LingYun
Multi-label few-shot aspect category detection aims at identifying multiple aspect categories from sentences with a limited number of training instances. The representation of sentences and categories is a key issue in this task. Most of current methods extract keywords for the sentence representations and the category representations. Sentences often contain many category-independent words, which leads to suboptimal performance of keyword-based methods. Instead of directly extracting keywords, we propose a label-guided prompt method to represent sentences and categories. To be specific, we design label-specific prompts to represent sentences by combining crucial contextual and semantic information. Further, the label is introduced into a prompt to obtain category descriptions by utilizing a large language model. This kind of category descriptions contain the characteristics of the aspect categories, guiding the construction of discriminative category prototypes. Experimental results on two public datasets show that our method outperforms current state-of-the-art methods with a 3.86% - 4.75% improvement in the Macro-F1 score.
Unleash the Power of Ellipsis: Accuracy-enhanced Sparse Vector Technique with Exponential Noise
Liu, Yuhan, Wang, Sheng, Liu, Yixuan, Li, Feifei, Chen, Hong
The Sparse Vector Technique (SVT) is one of the most fundamental tools in differential privacy (DP). It works as a backbone for adaptive data analysis by answering a sequence of queries on a given dataset, and gleaning useful information in a privacy-preserving manner. Unlike the typical private query releases that directly publicize the noisy query results, SVT is less informative -- it keeps the noisy query results to itself and only reveals a binary bit for each query, indicating whether the query result surpasses a predefined threshold. To provide a rigorous DP guarantee for SVT, prior works in the literature adopt a conservative privacy analysis by assuming the direct disclosure of noisy query results as in typical private query releases. This approach, however, hinders SVT from achieving higher query accuracy due to an overestimation of the privacy risks, which further leads to an excessive noise injection using the Laplacian or Gaussian noise for perturbation. Motivated by this, we provide a new privacy analysis for SVT by considering its less informative nature. Our analysis results not only broaden the range of applicable noise types for perturbation in SVT, but also identify the exponential noise as optimal among all evaluated noises (which, however, is usually deemed non-applicable in prior works). The main challenge in applying exponential noise to SVT is mitigating the sub-optimal performance due to the bias introduced by noise distributions. To address this, we develop a utility-oriented optimal threshold correction method and an appending strategy, which enhances the performance of SVT by increasing the precision and recall, respectively. The effectiveness of our proposed methods is substantiated both theoretically and empirically, demonstrating significant improvements up to $50\%$ across evaluated metrics.
Generative Retrieval with Preference Optimization for E-commerce Search
Li, Mingming, Wang, Huimu, Chen, Zuxu, Nie, Guangtao, Qiu, Yiming, Wang, Binbin, Tang, Guoyu, Liu, Lin, Zhuo, Jingwei
Generative retrieval introduces a groundbreaking paradigm to document retrieval by directly generating the identifier of a pertinent document in response to a specific query. This paradigm has demonstrated considerable benefits and potential, particularly in representation and generalization capabilities, within the context of large language models. However, it faces significant challenges in E-commerce search scenarios, including the complexity of generating detailed item titles from brief queries, the presence of noise in item titles with weak language order, issues with long-tail queries, and the interpretability of results. To address these challenges, we have developed an innovative framework for E-commerce search, called generative retrieval with preference optimization. This framework is designed to effectively learn and align an autoregressive model with target data, subsequently generating the final item through constraint-based beam search. By employing multi-span identifiers to represent raw item titles and transforming the task of generating titles from queries into the task of generating multi-span identifiers from queries, we aim to simplify the generation process. The framework further aligns with human preferences using click data and employs a constrained search method to identify key spans for retrieving the final item, thereby enhancing result interpretability. Our extensive experiments show that this framework achieves competitive performance on a real-world dataset, and online A/B tests demonstrate the superiority and effectiveness in improving conversion gains.
Aligning Query Representation with Rewritten Query and Relevance Judgments in Conversational Search
Mo, Fengran, Qu, Chen, Mao, Kelong, Wu, Yihong, Su, Zhan, Huang, Kaiyu, Nie, Jian-Yun
Conversational search supports multi-turn user-system interactions to solve complex information needs. Different from the traditional single-turn ad-hoc search, conversational search encounters a more challenging problem of context-dependent query understanding with the lengthy and long-tail conversational history context. While conversational query rewriting methods leverage explicit rewritten queries to train a rewriting model to transform the context-dependent query into a stand-stone search query, this is usually done without considering the quality of search results. Conversational dense retrieval methods use fine-tuning to improve a pre-trained ad-hoc query encoder, but they are limited by the conversational search data available for training. In this paper, we leverage both rewritten queries and relevance judgments in the conversational search data to train a better query representation model. The key idea is to align the query representation with those of rewritten queries and relevant documents. The proposed model -- Query Representation Alignment Conversational Dense Retriever, QRACDR, is tested on eight datasets, including various settings in conversational search and ad-hoc search. The results demonstrate the strong performance of QRACDR compared with state-of-the-art methods, and confirm the effectiveness of representation alignment.
Analyzing and reducing the synthetic-to-real transfer gap in Music Information Retrieval: the task of automatic drum transcription
Zehren, Mickaรซl, Alunno, Marco, Bientinesi, Paolo
Automatic drum transcription is a critical tool in Music Information Retrieval for extracting and analyzing the rhythm of a music track, but it is limited by the size of the datasets available for training. A popular method used to increase the amount of data is by generating them synthetically from music scores rendered with virtual instruments. This method can produce a virtually infinite quantity of tracks, but empirical evidence shows that models trained on previously created synthetic datasets do not transfer well to real tracks. In this work, besides increasing the amount of data, we identify and evaluate three more strategies that practitioners can use to improve the realism of the generated data and, thus, narrow the synthetic-to-real transfer gap. To explore their efficacy, we used them to build a new synthetic dataset and then we measured how the performance of a model scales and, specifically, at what value it will stagnate when increasing the number of training tracks for different datasets. By doing this, we were able to prove that the aforementioned strategies contribute to make our dataset the one with the most realistic data distribution and the lowest synthetic-to-real transfer gap among the synthetic datasets we evaluated. We conclude by highlighting the limits of training with infinite data in drum transcription and we show how they can be overcome.
Evaluating LLMs for Text-to-SQL Generation With Complex SQL Workload
This study presents a comparative analysis of the a complex SQL benchmark, TPC-DS, with two existing text-to-SQL benchmarks, BIRD and Spider. Our findings reveal that TPC-DS queries exhibit a significantly higher level of structural complexity compared to the other two benchmarks. This underscores the need for more intricate benchmarks to simulate realistic scenarios effectively. To facilitate this comparison, we devised several measures of structural complexity and applied them across all three benchmarks. The results of this study can guide future research in the development of more sophisticated text-to-SQL benchmarks. We utilized 11 distinct Language Models (LLMs) to generate SQL queries based on the query descriptions provided by the TPC-DS benchmark. The prompt engineering process incorporated both the query description as outlined in the TPC-DS specification and the database schema of TPC-DS. Our findings indicate that the current state-of-the-art generative AI models fall short in generating accurate decision-making queries. We conducted a comparison of the generated queries with the TPC-DS gold standard queries using a series of fuzzy structure matching techniques based on query features. The results demonstrated that the accuracy of the generated queries is insufficient for practical real-world application.
The Morning After: OpenAI reveals its AI-powered search engine, SearchGPT
OpenAI announced a new AI-powered search engine prototype called SearchGPT. It's described SearchGPT as "a temporary prototype of new AI search features that give you fast and timely answers with clear and relevant sources." The company plans to test out the product with 10,000 initial users, then roll it into ChatGPT after gathering feedback. It's a spicy time to launch AI-powered search engines. Last month, Perplexity faced criticism for summarizing stories from Forbes and Wired without adequate attribution or backlinks to the publications.
ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema
Wang, Fei, Zheng, Yuewen, Li, Qin, Wu, Jingyi, Li, Pengfei, Zhang, Luxia
Objective: This study introduces ChatSchema, an effective method for extracting and structuring information from unstructured data in medical paper reports using a combination of Large Multimodal Models (LMMs) and Optical Character Recognition (OCR) based on the schema. By integrating predefined schema, we intend to enable LMMs to directly extract and standardize information according to the schema specifications, facilitating further data entry. Method: Our approach involves a two-stage process, including classification and extraction for categorizing report scenarios and structuring information. We established and annotated a dataset to verify the effectiveness of ChatSchema, and evaluated key extraction using precision, recall, F1-score, and accuracy metrics. Based on key extraction, we further assessed value extraction. We conducted ablation studies on two LMMs to illustrate the improvement of structured information extraction with different input modals and methods. Result: We analyzed 100 medical reports from Peking University First Hospital and established a ground truth dataset with 2,945 key-value pairs. We evaluated ChatSchema using GPT-4o and Gemini 1.5 Pro and found a higher overall performance of GPT-4o. The results are as follows: For the result of key extraction, key-precision was 98.6%, key-recall was 98.5%, key-F1-score was 98.6%. For the result of value extraction based on correct key extraction, the overall accuracy was 97.2%, precision was 95.8%, recall was 95.8%, and F1-score was 95.8%. An ablation study demonstrated that ChatSchema achieved significantly higher overall accuracy and overall F1-score of key-value extraction, compared to the Baseline, with increases of 26.9% overall accuracy and 27.4% overall F1-score, respectively.