Media
Hierarchical Document Refinement for Long-context Retrieval-augmented Generation
Jin, Jiajie, Li, Xiaoxi, Dong, Guanting, Zhang, Yuyao, Zhu, Yutao, Wu, Yongkang, Li, Zhonghua, Ye, Qi, Dou, Zhicheng
Real-world RAG applications often encounter long-context input scenarios, where redundant information and noise results in higher inference costs and reduced performance. To address these challenges, we propose LongRefiner, an efficient plug-and-play refiner that leverages the inherent structural characteristics of long documents. LongRefiner employs dual-level query analysis, hierarchical document structuring, and adaptive refinement through multi-task learning on a single foundation model. Experiments on seven QA datasets demonstrate that LongRefiner achieves competitive performance in various scenarios while using 10x fewer computational costs and latency compared to the best baseline. Further analysis validates that LongRefiner is scalable, efficient, and effective, providing practical insights for real-world long-text RAG applications. Our code is available at https://github.com/ignorejjj/LongRefiner.
Disaggregated Deep Learning via In-Physics Computing at Radio Frequency
Gao, Zhihui, Vadlamani, Sri Krishna, Sulimany, Kfir, Englund, Dirk, Chen, Tingjun
Modern edge devices, such as cameras, drones, and Internet-of-Things nodes, rely on deep learning to enable a wide range of intelligent applications, including object recognition, environment perception, and autonomous navigation. However, deploying deep learning models directly on the often resource-constrained edge devices demands significant memory footprints and computational power for real-time inference using traditional digital computing architectures. In this paper, we present WISE, a novel computing architecture for wireless edge networks designed to overcome energy constraints in deep learning inference. WISE achieves this goal through two key innovations: disaggregated model access via wireless broadcasting and in-physics computation of general complex-valued matrix-vector multiplications directly at radio frequency. Using a software-defined radio platform with wirelessly broadcast model weights over the air, we demonstrate that WISE achieves 95.7% image classification accuracy with ultra-low operation power of 6.0 fJ/MAC per client, corresponding to a computation efficiency of 165.8 TOPS/W. This approach enables energy-efficient deep learning inference on wirelessly connected edge devices, achieving more than two orders of magnitude improvement in efficiency compared to traditional digital computing.
Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M
Di Palma, Dario, Merra, Felice Antonio, Sfilio, Maurizio, Anelli, Vito Walter, Narducci, Fedelucio, Di Noia, Tommaso
Large Language Models (LLMs) have become increasingly central to recommendation scenarios due to their remarkable natural language understanding and generation capabilities. Although significant research has explored the use of LLMs for various recommendation tasks, little effort has been dedicated to verifying whether they have memorized public recommendation dataset as part of their training data. This is undesirable because memorization reduces the generalizability of research findings, as benchmarking on memorized datasets does not guarantee generalization to unseen datasets. Furthermore, memorization can amplify biases, for example, some popular items may be recommended more frequently than others. In this work, we investigate whether LLMs have memorized public recommendation datasets. Specifically, we examine two model families (GPT and Llama) across multiple sizes, focusing on one of the most widely used dataset in recommender systems: MovieLens-1M. First, we define dataset memorization as the extent to which item attributes, user profiles, and user-item interactions can be retrieved by prompting the LLMs. Second, we analyze the impact of memorization on recommendation performance. Lastly, we examine whether memorization varies across model families and model sizes. Our results reveal that all models exhibit some degree of memorization of MovieLens-1M, and that recommendation performance is related to the extent of memorization. We have made all the code publicly available at: https://github.com/sisinflab/LLM-MemoryInspector
Detecting Musical Deepfakes
Ab s tract -- The proliferation of Text - to - Music (TTM) platforms has democratized music creation, letting users effortlessly generat e high - quality compositions . However, this innovation has also introduced challenges to musicians and the music in dustry . T his research focuses on utilizing the FakeMusicCaps dataset to address the challenge of detecting AI - generated songs by classifying the audio as deepfake or human. To simulate a real - world adversarial entity tempo stretching and pitch shifting modifications were applied to the dataset . Mel Spectrograms were generated from the resulting datasets, w hich were then used to train and test a convolutional neural network. This paper also explores the ethical and societal implications of TTM platforms, suggesting that detection systems developed and employed with care are a necessary tool to safeguard musicians and foster the positive potential of TTM plat forms and gen erative AI in music . Rapid a dvances in g e nerative AI have caused the creat ive landscape to be u pended, enabling almost anyone to easily create music that can be hard to distinguish from human - ma de compositions . AI - generated music is part of a wider classification of AI - generated media and art that falls unde r the category of " deepfake " .
Social media giant hit with scathing ad campaign amid anger over AI chatbots sexually exploiting kids
A nonprofit parents coalition is calling on multiple congressional committees to launch an investigation into Meta for prioritizing engagement metrics that put children's safety at risk. The call is part of a three-pronged attack campaign by the American Parents Coalition (APC), launched Thursday. It includes a letter to lawmakers with calls for investigations, a new parental notification system to help parents stay informed on issues impacting their kids at Meta and beyond, and mobile billboards at Meta D.C. and California headquarters, calling out the company for failure to adequately prioritize protecting children. APC's campaign follows an April Wall Street Journal report that included an investigation looking into how the company's metrics focus has led to potential harms for children. "This is not the first time Meta has been caught making tech available to kids that exposes them to inappropriate content," APC Executive Director Alleigh Marre said. "Parents across America should be extremely wary of their children's online activity, especially when it involves emerging technology like AI digital companions.
Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation
We introduce WebApp1K, a novel benchmark for evaluating large language models (LLMs) in test-driven development (TDD) tasks, where test cases serve as both prompt and verification for code generation. Unlike traditional approaches relying on natural language prompts, our benchmark emphasizes the ability of LLMs to interpret and implement functionality directly from test cases, reflecting real-world software development practices. Comprising 1000 diverse challenges across 20 application domains, the benchmark evaluates LLMs on their ability to generate compact, functional code under the constraints of context length and multi-feature complexity. Our findings highlight instruction following and in-context learning as critical capabilities for TDD success, surpassing the importance of general coding proficiency or pretraining knowledge. Through comprehensive evaluation of 19 frontier models, we reveal performance bottlenecks, such as instruction loss in long prompts, and provide a detailed error analysis spanning multiple root causes. This work underscores the practical value of TDD-specific benchmarks and lays the foundation for advancing LLM capabilities in rigorous, application-driven coding scenarios.
Deconstructing Jazz Piano Style Using Machine Learning
Cheston, Huw, Bance, Reuben, Harrison, Peter M. C.
For a visual artist, their style might include aspects such as subject choice, colour choice, and brush techniques; for a writer, it might include vocabulary, syntactic constructions, and narrative archetypes; for a composer, it might include harmonic progressions, rhythmic patterns, and melodic motifs. Individual differences across all these parameters, and more, come together to define each artist's unique style. Most of these stylistic parameters can theoretically be assessed by human experts. However, such assessments are necessarily slow and hence hard to apply at scale. Subjectivity is also a problem, since every human analyst comes with their own history of artistic exposure that will inevitably affect how they interpret artworks. Computational methods promise a more scalable and objective approach to this problem. Once a researcher has crafted an algorithm that captures a particular stylistic parameter -- for example, using entropy to capture vocabulary complexity -- then a computer can easily apply the algorithm to large datasets, and hence compare different artists using this parameter (Abry et al., 2013; Cheston et al., 2024b; Deepaisarn et al., 2023; Li et al., 2012).
A Retrieval-Augmented Generation Framework for Academic Literature Navigation in Data Science
Aytar, Ahmet Yasin, Kilic, Kemal, Kaya, Kamer
In the rapidly evolving field of data science, efficiently navigating the expansive body of academic literature is crucial for informed decision-making and innovation. This paper presents an enhanced Retrieval-Augmented Generation (RAG) application, an artificial intelligence (AI)-based system designed to assist data scientists in accessing precise and contextually relevant academic resources. The AI-powered application integrates advanced techniques, including the GeneRation Of BIbliographic Data (GROBID) technique for extracting bibliographic information, fine-tuned embedding models, semantic chunking, and an abstract-first retrieval method, to significantly improve the relevance and accuracy of the retrieved information. This implementation of AI specifically addresses the challenge of academic literature navigation. A comprehensive evaluation using the Retrieval-Augmented Generation Assessment System (RAGAS) framework demonstrates substantial improvements in key metrics, particularly Context Relevance, underscoring the system's effectiveness in reducing information overload and enhancing decision-making processes. Our findings highlight the potential of this enhanced Retrieval-Augmented Generation system to transform academic exploration within data science, ultimately advancing the workflow of research and innovation in the field.
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis
Ke, Bingxin, Qu, Kevin, Wang, Tianfu, Metzger, Nando, Huang, Shengyu, Li, Bo, Obukhov, Anton, Schindler, Konrad
The success of deep learning in computer vision over the past decade has hinged on large labeled datasets and strong pretrained models. In data-scarce settings, the quality of these pretrained models becomes crucial for effective transfer learning. Image classification and self-supervised learning have traditionally been the primary methods for pretraining CNNs and transformer-based architectures. Recently, the rise of text-to-image generative models, particularly those using denoising diffusion in a latent space, has introduced a new class of foundational models trained on massive, captioned image datasets. These models' ability to generate realistic images of unseen content suggests they possess a deep understanding of the visual world. In this work, we present Marigold, a family of conditional generative models and a fine-tuning protocol that extracts the knowledge from pretrained latent diffusion models like Stable Diffusion and adapts them for dense image analysis tasks, including monocular depth estimation, surface normals prediction, and intrinsic decomposition. Marigold requires minimal modification of the pre-trained latent diffusion model's architecture, trains with small synthetic datasets on a single GPU over a few days, and demonstrates state-of-the-art zero-shot generalization. Project page: https://marigoldcomputervision.github.io
Ethical Aspects of the Use of Social Robots in Elderly Care -- A Systematic Qualitative Review
Leineweber, Marianne, Keusgen, Clara Victoria, Bubeck, Marc, Haltaufderheide, Joschka, Ranisch, Robert, Klingler, Corinna
Background: The use of social robotics in elderly care is increasingly discussed as one way of meeting emerging care needs due to scarce resources. While many potential benefits are associated with robotic care technologies, there is a variety of ethical challenges. To support steps towards a responsible implementation and use, this review develops an overview on ethical aspects of the use of social robots in elderly care from a decision-makers' perspective. Methods: Electronic databases were queried using a comprehensive search strategy based on the key concepts of "ethical aspects", "social robotics" and "elderly care". Abstract and title screening was conducted by two authors independently. Full-text screening was conducted by one author following a joint consolidation phase. Data was extracted using MAXQDA24 by one author, based on a consolidated coding framework. Analysis was performed through modified qualitative content analysis. Results: A total of 1,518 publications were screened, and 248 publications were included. We have organized our analysis in a scheme of ethical hazards, ethical opportunities and unsettled questions, identifying at least 60 broad ethical aspects affecting three different stakeholder groups. While some ethical issues are well-known and broadly discussed our analysis shows a plethora of potentially relevant aspects, often only marginally recognized, that are worthy of consideration from a practical perspective. Discussion: The findings highlight the need for a contextual and detailed evaluation of implementation scenarios. To make use of the vast knowledge of the ethical discourse, we hypothesize that decision-makers need to understand the specific nature of this discourse to be able to engage in careful ethical deliberation.