Overview
Prime Convolutional Model: Breaking the Ground for Theoretical Explainability
Panelli, Francesco, Almhaithawi, Doaa, Cerquitelli, Tania, Bellini, Alessandro
In this paper, we propose a new theoretical approach to Explainable AI. Following the Scientific Method, this approach consists in formulating on the basis of empirical evidence, a mathematical model to explain and predict the behaviors of Neural Networks. We apply the method to a case study created in a controlled environment, which we call Prime Convolutional Model (p-Conv for short). p-Conv operates on a dataset consisting of the first one million natural numbers and is trained to identify the congruence classes modulo a given integer $m$. Its architecture uses a convolutional-type neural network that contextually processes a sequence of $B$ consecutive numbers to each input. We take an empirical approach and exploit p-Conv to identify the congruence classes of numbers in a validation set using different values for $m$ and $B$. The results show that the different behaviors of p-Conv (i.e., whether it can perform the task or not) can be modeled mathematically in terms of $m$ and $B$. The inferred mathematical model reveals interesting patterns able to explain when and why p-Conv succeeds in performing task and, if not, which error pattern it follows.
VWAP Execution with Signature-Enhanced Transformers: A Multi-Asset Learning Approach
In this paper I propose a novel approach to Volume Weighted Average Price (VWAP) execution that addresses two key practical challenges: the need for asset-specific model training and the capture of complex temporal dependencies. Building upon my recent work in dynamic VWAP execution arXiv:2502.18177, I demonstrate that a single neural network trained across multiple assets can achieve performance comparable to or better than traditional asset-specific models. The proposed architecture combines a transformer-based design inspired by arXiv:2406.02486 with path signatures for capturing geometric features of price-volume trajectories, as in arXiv:2406.17890. The empirical analysis, conducted on hourly cryptocurrency trading data from 80 trading pairs, shows that the globally-fitted model with signature features (GFT-Sig) achieves superior performance in both absolute and quadratic VWAP loss metrics compared to asset-specific approaches. Notably, these improvements persist for out-of-sample assets, demonstrating the model's ability to generalize across different market conditions. The results suggest that combining global parameter sharing with signature-based feature extraction provides a scalable and robust approach to VWAP execution, offering significant practical advantages over traditional asset-specific implementations.
Reflection on Data Storytelling Tools in the Generative AI Era from the Human-AI Collaboration Perspective
Li, Haotian, Wang, Yun, Qu, Huamin
To lower the barrier of telling appealing and effective stories, researchers have spent considerable efforts to build AI-powered tools to facilitate their creation and communication with different strategies of human-AI collaboration [29]. In these tools, AI collaborators are often powered by heuristic-based methods [57], traditional machine learning models [13], or smaller-scale deep learning models [36]. Compared to the previous techniques for AI collaborators, the recently emerging large-scale generative AI models, including the text-to-image models [62] and large language models (LLMs) [69], can achieve better performance on various data storytelling-related tasks, such as data analysis [20] and text generation [72], and enhance the communication between humans and AI with conversations. These advantages indicate their potentials to be game-changers in the research direction of human-AI collaboration for data storytelling, including improving the experience of collaborating with AI and diversifying the collaboration patterns between humans and AI [29]. After two years of the public access of these models, it is a critical time point to reflect how this research discipline progresses in the new era of large-scale generative AI models and identify future opportunities. To achieve the goal, it is essential not only to focus on how these generative AI techniques are applied in existing tools, as explored in a previous survey [17], but more importantly, to compare the human-AI collaboration patterns in the latest tools in the generative AI era with those in earlier ones. Only through this comparison can we understand the shift in human-AI collaboration paradigms, identify the value of these powerful techniques in enhancing human-AI collaboration, and propose future research directions.
RaceVLA: VLA-based Racing Drone Navigation with Human-like Behaviour
Serpiva, Valerii, Lykov, Artem, Myshlyaev, Artyom, Khan, Muhammad Haris, Abdulkarim, Ali Alridha, Sautenkov, Oleg, Tsetserukou, Dzmitry
RaceVLA presents an innovative approach for autonomous racing drone navigation by leveraging Visual-Language-Action (VLA) to emulate human-like behavior. This research explores the integration of advanced algorithms that enable drones to adapt their navigation strategies based on real-time environmental feedback, mimicking the decision-making processes of human pilots. The model, fine-tuned on a collected racing drone dataset, demonstrates strong generalization despite the complexity of drone racing environments. RaceVLA outperforms OpenVLA in motion (75.0 vs 60.0) and semantic generalization (45.5 vs 36.3), benefiting from the dynamic camera and simplified motion tasks. However, visual (79.6 vs 87.0) and physical (50.0 vs 76.7) generalization were slightly reduced due to the challenges of maneuvering in dynamic environments with varying object sizes. RaceVLA also outperforms RT-2 across all axes - visual (79.6 vs 52.0), motion (75.0 vs 55.0), physical (50.0 vs 26.7), and semantic (45.5 vs 38.8), demonstrating its robustness for real-time adjustments in complex environments. Experiments revealed an average velocity of 1.04 m/s, with a maximum speed of 2.02 m/s, and consistent maneuverability, demonstrating RaceVLA's ability to handle high-speed scenarios effectively. These findings highlight the potential of RaceVLA for high-performance navigation in competitive racing contexts. The RaceVLA codebase, pretrained weights, and dataset are available at this http URL: https://racevla.github.io/
A Systematic Literature Review on Safety of the Intended Functionality for Automated Driving Systems
Patel, Milin, Jung, Rolf, Khatun, Marzana
In the automobile industry, ensuring the safety of automated vehicles equipped with the Automated Driving System (ADS) is becoming a significant focus due to the increasing development and deployment of automated driving. Automated driving depends on sensing both the external and internal environments of a vehicle, utilizing perception sensors and algorithms, and Electrical/Electronic (E/E) systems for situational awareness and response. ISO 21448 is the standard for Safety of the Intended Functionality (SOTIF) that aims to ensure that the ADS operate safely within their intended functionality. SOTIF focuses on preventing or mitigating potential hazards that may arise from the limitations or failures of the ADS, including hazards due to insufficiencies of specification, or performance insufficiencies, as well as foreseeable misuse of the intended functionality. However, the challenge lies in ensuring the safety of vehicles despite the limited availability of extensive and systematic literature on SOTIF. To address this challenge, a Systematic Literature Review (SLR) on SOTIF for the ADS is performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The objective is to methodically gather and analyze the existing literature on SOTIF. The major contributions of this paper are: (i) presenting a summary of the literature by synthesizing and organizing the collective findings, methodologies, and insights into distinct thematic groups, and (ii) summarizing and categorizing the acknowledged limitations based on data extracted from an SLR of 51 research papers published between 2018 and 2023. Furthermore, research gaps are determined, and future research directions are proposed.
PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset
Asif, Haider, Basit, Abdul, Innan, Nouhaila, Kashif, Muhammad, Marchisio, Alberto, Shafique, Muhammad
Large Language Models (LLMs) offer remarkable capabilities in code generation, natural language processing, and domain-specific reasoning. Their potential in aiding quantum software development remains underexplored, particularly for the PennyLane framework-a leading platform for hybrid quantum-classical computing. To address this gap, we introduce a novel, high-quality dataset comprising 3,347 PennyLane-specific code samples of quantum circuits and their contextual descriptions, specifically curated to train/fine-tune LLM-based quantum code assistance. Our key contributions are threefold: (1) the automatic creation and open-source release of a comprehensive PennyLane dataset leveraging quantum computing textbooks, official documentation, and open-source repositories; (2) the development of a systematic methodology for data refinement, annotation, and formatting to optimize LLM training efficiency; and (3) a thorough evaluation, based on a Retrieval-Augmented Generation (RAG) framework, demonstrating the effectiveness of our dataset in streamlining PennyLane code generation and improving quantum development workflows. Compared to existing efforts that predominantly focus on Qiskit, our dataset significantly broadens the spectrum of quantum frameworks covered in AI-driven code assistance. By bridging this gap and providing reproducible dataset-creation methodologies, we aim to advance the field of AI-assisted quantum programming, making quantum computing more accessible to both newcomers and experienced developers.
Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations
Yang, Yuhao, Ji, Zhi, Li, Zhaopeng, Li, Yi, Mo, Zhonglin, Ding, Yue, Chen, Kai, Zhang, Zijian, Li, Jie, Li, Shuanglong, Liu, Lin
Generative models have recently gained attention in recommendation systems by directly predicting item identifiers from user interaction sequences. However, existing methods suffer from significant information loss due to the separation of stages such as quantization and sequence modeling, hindering their ability to achieve the modeling precision and accuracy of sequential dense retrieval techniques. Integrating generative and dense retrieval methods remains a critical challenge. To address this, we introduce the Cascaded Organized Bi-Represented generAtive retrieval (COBRA) framework, which innovatively integrates sparse semantic IDs and dense vectors through a cascading process. Our method alternates between generating these representations by first generating sparse IDs, which serve as conditions to aid in the generation of dense vectors. End-to-end training enables dynamic refinement of dense representations, capturing both semantic insights and collaborative signals from user-item interactions. During inference, COBRA employs a coarse-to-fine strategy, starting with sparse ID generation and refining them into dense vectors via the generative model. We further propose BeamFusion, an innovative approach combining beam search with nearest neighbor scores to enhance inference flexibility and recommendation diversity. Extensive experiments on public datasets and offline tests validate our method's robustness. Online A/B tests on a real-world advertising platform with over 200 million daily users demonstrate substantial improvements in key metrics, highlighting COBRA's practical advantages.
Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization
Qiu, Yilun, Zhao, Xiaoyan, Zhang, Yang, Bai, Yimeng, Wang, Wenjie, Cheng, Hong, Feng, Fuli, Chua, Tat-Seng
Personalizing Large Language Models (LLMs) has become a critical step in facilitating their widespread application to enhance individual life experiences. In pursuit of personalization, distilling key preference information from an individual's historical data as instructional preference context to customize LLM generation has emerged as a promising direction. However, these methods face a fundamental limitation by overlooking the inter-user comparative analysis, which is essential for identifying the inter-user differences that truly shape preferences. To address this limitation, we propose Difference-aware Personalization Learning (DPL), a novel approach that emphasizes extracting inter-user differences to enhance LLM personalization. DPL strategically selects representative users for comparison and establishes a structured standard to extract meaningful, task-relevant differences for customizing LLM generation. Extensive experiments on real-world datasets demonstrate that DPL significantly enhances LLM personalization. We release our code at https://github.com/SnowCharmQ/DPL.
A Comprehensive Survey on Composed Image Retrieval
Song, Xuemeng, Lin, Haoqiang, Wen, Haokun, Hou, Bohan, Xu, Mingzhu, Nie, Liqiang
Composed Image Retrieval (CIR) is an emerging yet challenging task that allows users to search for target images using a multimodal query, comprising a reference image and a modification text specifying the user's desired changes to the reference image. Given its significant academic and practical value, CIR has become a rapidly growing area of interest in the computer vision and machine learning communities, particularly with the advances in deep learning. To the best of our knowledge, there is currently no comprehensive review of CIR to provide a timely overview of this field. Therefore, we synthesize insights from over 120 publications in top conferences and journals, including ACM TOIS, SIGIR, and CVPR In particular, we systematically categorize existing supervised CIR and zero-shot CIR models using a fine-grained taxonomy. For a comprehensive review, we also briefly discuss approaches for tasks closely related to CIR, such as attribute-based CIR and dialog-based CIR. Additionally, we summarize benchmark datasets for evaluation and analyze existing supervised and zero-shot CIR methods by comparing experimental results across multiple datasets. Furthermore, we present promising future directions in this field, offering practical insights for researchers interested in further exploration. The curated collection of related works is maintained and continuously updated in https://github.com/haokunwen/Awesome-Composed-Image-Retrieval.
Lossy Neural Compression for Geospatial Analytics: A Review
Gomes, Carlos, Wittmann, Isabelle, Robert, Damien, Jakubik, Johannes, Reichelt, Tim, Martone, Michele, Maurogiovanni, Stefano, Vinge, Rikard, Hurst, Jonas, Scheurer, Erik, Sedona, Rocco, Brunschwiler, Thomas, Kesselheim, Stefan, Batic, Matej, Stier, Philip, Wegner, Jan Dirk, Cavallaro, Gabriele, Pebesma, Edzer, Marszalek, Michael, Belenguer-Plomer, Miguel A, Adriko, Kennedy, Fraccaro, Paolo, Kienzler, Romeo, Briq, Rania, Benassou, Sabrina, Lazzarini, Michele, Albrecht, Conrad M
Over the past decades, there has been an explosion in the amount of available Earth Observation (EO) data. The unprecedented coverage of the Earth's surface and atmosphere by satellite imagery has resulted in large volumes of data that must be transmitted to ground stations, stored in data centers, and distributed to end users. Modern Earth System Models (ESMs) face similar challenges, operating at high spatial and temporal resolutions, producing petabytes of data per simulated day. Data compression has gained relevance over the past decade, with neural compression (NC) emerging from deep learning and information theory, making EO data and ESM outputs ideal candidates due to their abundance of unlabeled data. In this review, we outline recent developments in NC applied to geospatial data. We introduce the fundamental concepts of NC including seminal works in its traditional applications to image and video compression domains with focus on lossy compression. We discuss the unique characteristics of EO and ESM data, contrasting them with "natural images", and explain the additional challenges and opportunities they present. Moreover, we review current applications of NC across various EO modalities and explore the limited efforts in ESM compression to date. The advent of self-supervised learning (SSL) and foundation models (FM) has advanced methods to efficiently distill representations from vast unlabeled data. We connect these developments to NC for EO, highlighting the similarities between the two fields and elaborate on the potential of transferring compressed feature representations for machine--to--machine communication. Based on insights drawn from this review, we devise future directions relevant to applications in EO and ESM.