Media
Why do so many AI company logos look like buttholes?
Feedback is New Scientist's popular sideways look at the latest science and technology news. You can submit items you believe may amuse readers to Feedback by emailing feedback@newscientist.com The past few years have seen the emergence of a great many AI companies. This is extremely exciting/alarming (delete according to whether you bought shares early), but it has also had a secondary consequence. Along with the proliferation of AI companies has come a proliferation of AI company logos.
A Generative-AI-Driven Claim Retrieval System Capable of Detecting and Retrieving Claims from Social Media Platforms in Multiple Languages
Vykopal, Ivan, Hyben, Martin, Moro, Robert, Gregor, Michal, Simko, Jakub
Online disinformation poses a global challenge, placing significant demands on fact-checkers who must verify claims efficiently to prevent the spread of false information. A major issue in this process is the redundant verification of already fact-checked claims, which increases workload and delays responses to newly emerging claims. This research introduces an approach that retrieves previously fact-checked claims, evaluates their relevance to a given input, and provides supplementary information to support fact-checkers. Our method employs large language models (LLMs) to filter irrelevant fact-checks and generate concise summaries and explanations, enabling fact-checkers to faster assess whether a claim has been verified before. In addition, we evaluate our approach through both automatic and human assessments, where humans interact with the developed tool to review its effectiveness. Our results demonstrate that LLMs are able to filter out many irrelevant fact-checks and, therefore, reduce effort and streamline the fact-checking process.
Fane at SemEval-2025 Task 10: Zero-Shot Entity Framing with Large Language Models
Fane, Enfa, Surdeanu, Mihai, Blanco, Eduardo, Corman, Steven R.
Understanding how news narratives frame entities is crucial for studying media's impact on societal perceptions of events. In this paper, we evaluate the zero-shot capabilities of large language models (LLMs) in classifying framing roles. Through systematic experimentation, we assess the effects of input context, prompting strategies, and task decomposition. Our findings show that a hierarchical approach of first identifying broad roles and then fine-grained roles, outperforms single-step classification. We also demonstrate that optimal input contexts and prompts vary across task levels, highlighting the need for subtask-specific strategies. We achieve a Main Role Accuracy of 89.4% and an Exact Match Ratio of 34.5%, demonstrating the effectiveness of our approach. Our findings emphasize the importance of tailored prompt design and input context optimization for improving LLM performance in entity framing.
JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
Afzal, Anum, Mercier, Alexandre, Matthes, Florian
Online platforms are increasingly interested in using Data-to-Text technologies to generate content and help their users. Unfortunately, traditional generative methods often fall into repetitive patterns, resulting in monotonous galleries of texts after only a few iterations. In this paper, we investigate LLM-based data-to-text approaches to automatically generate marketing texts that are of sufficient quality and diverse enough for broad adoption. We leverage Language Models such as T5, GPT -3.5, GPT -4, and LLaMa2 in conjunction with fine-tuning, few-shot, and zero-shot approaches to set a baseline for diverse marketing texts. We also introduce a metric JaccDiv to evaluate the diversity of a set of texts. This research extends its relevance beyond the music industry, proving beneficial in various fields where repetitive automated content generation is prevalent.
Disjunctive and Conjunctive Normal Form Explanations of Clusters Using Auxiliary Information
Downey, Robert F., Ravi, S. S.
We consider generating post-hoc explanations of clusters generated from various datasets using auxiliary information which was not used by clustering algorithms. Following terminology used in previous work, we refer to the auxiliary information as tags. Our focus is on two forms of explanations, namely disjunctive form (where the explanation for a cluster consists of a set of tags) and a two-clause conjunctive normal form (CNF) explanation (where the explanation consists of two sets of tags, combined through the AND operator). We use integer linear programming (ILP) as well as heuristic methods to generate these explanations. We experiment with a variety of datasets and discuss the insights obtained from our explanations. We also present experimental results regarding the scalability of our explanation methods.
An approach to melodic segmentation and classification based on filtering with the Haar-wavelet
Velarde, Gissel, Weyde, Tillman, Meredith, David
We present a novel method of classification and segmentation of melodies in symbolic representation. The method is based on filtering pitch as a signal over time with the Haar-wavelet, and we evaluate it on two tasks. The filtered signal corresponds to a single-scale signal ws from the continuous Haar wavelet transform. The melodies are first segmented using local maxima or zero-crossings of w_s. The segments of w_s are then classified using the k-nearest neighbour algorithm with Euclidian and city-block distances. The method proves more effective than using unfiltered pitch signals and Gestalt-based segmentation when used to recognize the parent works of segments from Bach's Two-Part Inventions (BWV 772-786). When used to classify 360 Dutch folk tunes into 26 tune families, the performance of the method is comparable to the use of pitch signals, but not as good as that of string-matching methods based on multiple features.
The When and How of Target Variable Transformations
The machine learning pipeline typically involves the iterative process of (1) collecting the data, (2) preparing the data, (3) learning a model, and (4) evaluating a model. Practitioners recognize the importance of the data preparation phase in terms of its impact on the ability to learn accurate models. In this regard, significant attention is often paid to manipulating the feature set (e.g., selection, transformations, dimensionality reduction). A point that is less well appreciated is that transformations on the target variable can also have a large impact on whether it is possible to learn a suitable model. These transformations may include accounting for subject-specific biases (e.g., in how someone uses a rating scale), contexts (e.g., population size effects), and general trends (e.g., inflation). However, this point has received a much more cursory treatment in the existing literature. The goal of this paper is three-fold. First, we aim to highlight the importance of this problem by showing when transforming the target variable has been useful in practice. Second, we will provide a set of generic ``rules of thumb'' that indicate situations when transforming the target variable may be needed. Third, we will discuss which transformations should be considered in a given situation.
Information Retrieval in the Age of Generative AI: The RGB Model
Garetto, Michele, Cornacchia, Alessandro, Galante, Franco, Leonardi, Emilio, Nordio, Alessandro, Tarable, Alberto
The advent of Large Language Models (LLMs) and generative AI is fundamentally transforming information retrieval and processing on the Internet, bringing both great potential and significant concerns regarding content authenticity and reliability. This paper presents a novel quantitative approach to shed light on the complex information dynamics arising from the growing use of generative AI tools. Despite their significant impact on the digital ecosystem, these dynamics remain largely uncharted and poorly understood. We propose a stochastic model to characterize the generation, indexing, and dissemination of information in response to new topics. This scenario particularly challenges current LLMs, which often rely on real-time Retrieval-Augmented Generation (RAG) techniques to overcome their static knowledge limitations. Our findings suggest that the rapid pace of generative AI adoption, combined with increasing user reliance, can outpace human verification, escalating the risk of inaccurate information proliferation across digital resources. An in-depth analysis of Stack Exchange data confirms that high-quality answers inevitably require substantial time and human effort to emerge. This underscores the considerable risks associated with generating persuasive text in response to new questions and highlights the critical need for responsible development and deployment of future generative AI tools.
Wavelet-Filtering of Symbolic Music Representations for Folk Tune Segmentation and Classification
Velarde, Gissel, Weyde, Tillman, Meredith, David
The aim of this study is to evaluate a machine - learning method in which symbolic representations of folk songs are segmented and classified into tune families with Haar - wavelet filtering. The method is compared with previously proposed Gestalt - based method . Melodies are represented as discrete symbolic pitch - time signals. We apply the continuous wavelet transform (CWT) with the Haar wavelet at specific scales, obtaining fi l-tered versions of melodies emphasizing their information at pa r-ticular time - scales. W e use the filtered signal for representation and segmentation, using the wavelet coefficients' local maxima to indicate local boundaries and classify segments by means of k - nearest neighbours based on standard vector - metrics (Eucli dean, cityblock), and com pare the results to a Gestalt - based se g-mentation method and metrics applied directly to the pitch si g-nal. We found that the wavelet based segmentation and wavelet - filtering of the pitch signal lead to better classification accuracy in cross - validated evalu ation when the time - scale and other p a-rameters are optimized .
Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User
Wang, Xiaolei, Xia, Chunxuan, Li, Junyi, Meng, Fanzhe, Huang, Lei, Wang, Jinpeng, Zhao, Wayne Xin, Wen, Ji-Rong
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations. A fundamental challenge in CRSs lies in effectively understanding user preferences from conversations. User preferences can be multifaceted and complex, posing significant challenges for accurate recommendations even with access to abundant external knowledge. While interaction with users can clarify their true preferences, frequent user involvement can lead to a degraded user experience. To address this problem, we propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs. The simulated user provides feedback to the items recommended by CRSs, enabling them to better capture intricate user preferences through multi-turn interaction. Inspired by generative reward models, we design two types of feedback actions for the simulated user: i.e., generative item scoring, which offers coarse-grained feedback, and attribute-based item critique, which provides fine-grained feedback. To ensure seamless integration, these feedback actions are unified into an instruction-based format, allowing the development of a unified simulated user via instruction tuning on synthesized data. With this simulated user, automatic multi-turn interaction with CRSs can be effectively conducted. Furthermore, to strike a balance between effectiveness and efficiency, we draw inspiration from the paradigm of reward-guided search in complex reasoning tasks and employ beam search for the interaction process. On top of this, we propose an efficient candidate ranking method to improve the recommendation results derived from interaction. Extensive experiments on public datasets demonstrate the effectiveness, efficiency, and transferability of our approach.