Overview
Energy-Latency Attacks: A New Adversarial Threat to Deep Learning
Meftah, Hanene F. Z. Brachemi, Hamidouche, Wassim, Fezza, Sid Ahmed, Deforges, Olivier
The growing computational demand for deep neural networks ( DNNs) has raised concerns about their energy consumption and carbon footprint, particularly as the size and complexity of the models continue to increase. To address these challenges, energy-efficient hardware and custom accelerators have become essential. Additionally, adaptable DNN s are being developed to dynamically balance performance and efficiency. The use of these strategies became more common to enable sustainable AI deployment. However, these efficiency-focused designs may also introduce vulnerabilities, as attackers can potentially exploit them to increase latency and energy usage by triggering their worst-case-performance scenarios. This new type of attack, called energy-latency attacks, has recently gained significant research attention, focusing on the vulnerability of DNN s to this emerging attack paradigm, which can trigger denial-of-service ( DoS) attacks. This paper provides a comprehensive overview of current research on energy-latency attacks, categorizing them using the established taxonomy for traditional adversarial attacks. We explore different metrics used to measure the success of these attacks and provide an analysis and comparison of existing attack strategies. We also analyze existing defense mechanisms and highlight current challenges and potential areas for future research in this developing field. The GitHub page for this work can be accessed at https://github.com/hbrachemi/Survey_energy_attacks/
Architecture for a Trustworthy Quantum Chatbot
Aragonรฉs-Soria, Yaiza, Oriol, Manuel
Large language model (LLM)-based tools such as ChatGPT seem useful for classical programming assignments. The more specialized the field, the more likely they lack reliability because of the lack of data to train them. In the case of quantum computing, the quality of answers of generic chatbots is low. C4Q is a chatbot focused on quantum programs that addresses this challenge through a software architecture that integrates specialized LLMs to classify requests and specialized question answering modules with a deterministic logical engine to provide trustworthy quantum computing support. This article describes the latest version (2.0) of C4Q, which delivers several enhancements: ready-to-run Qiskit code for gate definitions and circuit operations, expanded features to solve software engineering tasks such as the travelling salesperson problem and the knapsack problem, and a feedback mechanism for iterative improvement. Extensive testing of the backend confirms the system's reliability, while empirical evaluations show that C4Q 2.0's classification LLM reaches near-perfect accuracy. The evaluation of the result consists in a comparative study with three existing chatbots highlighting C4Q 2.0's maintainability and correctness, reflecting on how software architecture decisions, such as separating deterministic logic from probabilistic text generation impact the quality of the results.
Memory Is All You Need: Testing How Model Memory Affects LLM Performance in Annotation Tasks
Timoneda, Joan C., Vera, Sebastiรกn Vallejo
Generative Large Language Models (LLMs) have shown promising results in text annotation using zero-shot and few-shot learning. Yet these approaches do not allow the model to retain information from previous annotations, making each response independent from the preceding ones. This raises the question of whether model memory -- the LLM having knowledge about its own previous annotations in the same task -- affects performance. In this article, using OpenAI's GPT-4o and Meta's Llama 3.1 on two political science datasets, we demonstrate that allowing the model to retain information about its own previous classifications yields significant performance improvements: between 5 and 25\% when compared to zero-shot and few-shot learning. Moreover, memory reinforcement, a novel approach we propose that combines model memory and reinforcement learning, yields additional performance gains in three out of our four tests. These findings have important implications for applied researchers looking to improve performance and efficiency in LLM annotation tasks.
Simulating the Real World: A Unified Survey of Multimodal Generative Models
Hu, Yuqi, Wang, Longguang, Liu, Xian, Chen, Ling-Hao, Guo, Yuwei, Shi, Yukai, Liu, Ce, Rao, Anyi, Wang, Zeyu, Xiong, Hui
Understanding and replicating the real world is a critical challenge in Artificial General Intelligence (AGI) research. To achieve this, many existing approaches, such as world models, aim to capture the fundamental principles governing the physical world, enabling more accurate simulations and meaningful interactions. However, current methods often treat different modalities, including 2D (images), videos, 3D, and 4D representations, as independent domains, overlooking their interdependencies. Additionally, these methods typically focus on isolated dimensions of reality without systematically integrating their connections. In this survey, we present a unified survey for multimodal generative models that investigate the progression of data dimensionality in real-world simulation. Specifically, this survey starts from 2D generation (appearance), then moves to video (appearance+dynamics) and 3D generation (appearance+geometry), and finally culminates in 4D generation that integrate all dimensions. To the best of our knowledge, this is the first attempt to systematically unify the study of 2D, video, 3D and 4D generation within a single framework. To guide future research, we provide a comprehensive review of datasets, evaluation metrics and future directions, and fostering insights for newcomers. This survey serves as a bridge to advance the study of multimodal generative models and real-world simulation within a unified framework.
SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing
Yan, Xiangchao, Feng, Shiyang, Yuan, Jiakang, Xia, Renqiu, Wang, Bin, Zhang, Bo, Bai, Lei
Survey paper plays a crucial role in scientific research, especially given the rapid growth of research publications. Recently, researchers have begun using LLMs to automate survey generation for better efficiency. However, the quality gap between LLM-generated surveys and those written by human remains significant, particularly in terms of outline quality and citation accuracy. To close these gaps, we introduce SurveyForge, which first generates the outline by analyzing the logical structure of human-written outlines and referring to the retrieved domain-related articles. Subsequently, leveraging high-quality papers retrieved from memory by our scholar navigation agent, SurveyForge can automatically generate and refine the content of the generated article. Moreover, to achieve a comprehensive evaluation, we construct SurveyBench, which includes 100 human-written survey papers for win-rate comparison and assesses AI-generated survey papers across three dimensions: reference, outline, and content quality. Experiments demonstrate that SurveyForge can outperform previous works such as AutoSurvey.
HalluCounter: Reference-free LLM Hallucination Detection in the Wild!
Urlana, Ashok, Kanumolu, Gopichand, Kumar, Charaka Vinayak, Garlapati, Bala Mallikarjunarao, Mishra, Rahul
Response consistency-based, reference-free hallucination detection (RFHD) methods do not depend on internal model states, such as generation probabilities or gradients, which Grey-box models typically rely on but are inaccessible in closed-source LLMs. However, their inability to capture query-response alignment patterns often results in lower detection accuracy. Additionally, the lack of large-scale benchmark datasets spanning diverse domains remains a challenge, as most existing datasets are limited in size and scope. To this end, we propose HalluCounter, a novel reference-free hallucination detection method that utilizes both response-response and query-response consistency and alignment patterns. This enables the training of a classifier that detects hallucinations and provides a confidence score and an optimal response for user queries. Furthermore, we introduce HalluCounterEval, a benchmark dataset comprising both synthetically generated and human-curated samples across multiple domains. Our method outperforms state-of-the-art approaches by a significant margin, achieving over 90\% average confidence in hallucination detection across datasets.
The Next Frontier of LLM Applications: Open Ecosystems and Hardware Synergy
Hou, Xinyi, Zhao, Yanjie, Wang, Haoyu
The second paradigm involves LLM agents developed using frameworks like LangChain [16], AutoGPT [11], Langroid [18], AutoGen [23], and LlamaIndex [22], which offer greater programmability and modularity, allowing developers to build sophisticated, multi-agent systems that integrate external tools and dynamic workflows [20]. Despite their advantages, both paradigms remain architecturally fragmented and lack standardized interoperability, leading to redundant development efforts and constrained scalability. From a software engineering (SE) perspective, current LLM application paradigms resemble traditional platform-centric software ecosystems, where applications are tightly coupled to proprietary APIs and execution environments. LLM app stores, while lowering the barrier to entry, impose constraints on extensibility and cross-platform interoperability, leading to vendor lock-in and duplicated development efforts across different ecosystems. In contrast, agent-based LLM frameworks provide modularity but lack standardized mechanisms for component reuse and integration, making it challenging to compose LLM applications that seamlessly operate across heterogeneous environments. This fragmentation mirrors historical challenges in SE, where monolithic architectures have given way to service-oriented and microservices-based designs to improve reusability, scalability, and maintainability. Another key limitation of existing LLM applications is inefficient hardware utilization.
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model
Huang, Wenke, Liang, Jian, Guo, Xianda, Fang, Yiyang, Wan, Guancheng, Rong, Xuankun, Wen, Chi, Shi, Zekun, Li, Qingyun, Zhu, Didi, Ma, Yanbiao, Liang, Ke, Yang, Bin, Li, He, Shao, Jiawei, Ye, Mang, Du, Bo
Multi-modal Large Language Models (MLLMs) integrate visual and linguistic reasoning to address complex tasks such as image captioning and visual question answering. While MLLMs demonstrate remarkable versatility, MLLMs appears limited performance on special applications. But tuning MLLMs for downstream tasks encounters two key challenges: Task-Expert Specialization, where distribution shifts between pre-training and target datasets constrain target performance, and Open-World Stabilization, where catastrophic forgetting erases the model general knowledge. In this work, we systematically review recent advancements in MLLM tuning methodologies, classifying them into three paradigms: (I) Selective Tuning, (II) Additive Tuning, and (III) Reparameterization Tuning. Furthermore, we benchmark these tuning strategies across popular MLLM architectures and diverse downstream tasks to establish standardized evaluation analysis and systematic tuning principles. Finally, we highlight several open challenges in this domain and propose future research directions. To facilitate ongoing progress in this rapidly evolving field, we provide a public repository that continuously tracks developments: https://github.com/WenkeHuang/Awesome-MLLM-Tuning.
Dynamic Pricing for On-Demand DNN Inference in the Edge-AI Market
Li, Songyuan, Hu, Jia, Min, Geyong, Huang, Haojun, Huang, Jiwei
The convergence of edge computing and AI gives rise to Edge-AI, which enables the deployment of real-time AI applications and services at the network edge. One of the fundamental research issues in Edge-AI is edge inference acceleration, which aims to realize low-latency high-accuracy DNN inference services by leveraging the fine-grained offloading of partitioned inference tasks from end devices to edge servers. However, existing research has yet to adopt a practical Edge-AI market perspective, which would systematically explore the personalized inference needs of AI users (e.g., inference accuracy, latency, and task complexity), the revenue incentives for AI service providers that offer edge inference services, and multi-stakeholder governance within a market-oriented context. To bridge this gap, we propose an Auction-based Edge Inference Pricing Mechanism (AERIA) for revenue maximization to tackle the multi-dimensional optimization problem of DNN model partition, edge inference pricing, and resource allocation. We investigate the multi-exit device-edge synergistic inference scheme for on-demand DNN inference acceleration, and analyse the auction dynamics amongst the AI service providers, AI users and edge infrastructure provider. Owing to the strategic mechanism design via randomized consensus estimate and cost sharing techniques, the Edge-AI market attains several desirable properties, including competitiveness in revenue maximization, incentive compatibility, and envy-freeness, which are crucial to maintain the effectiveness, truthfulness, and fairness of our auction outcomes. The extensive simulation experiments based on four representative DNN inference workloads demonstrate that our AERIA mechanism significantly outperforms several state-of-the-art approaches in revenue maximization, demonstrating the efficacy of AERIA for on-demand DNN inference in the Edge-AI market.
Large Language Models in Bioinformatics: A Survey
Wang, Zhenyu, Wang, Zikang, Jiang, Jiyue, Chen, Pengan, Shi, Xiangyu, Li, Yu
Large Language Models (LLMs) are revolutionizing bioinformatics, enabling advanced analysis of DNA, RNA, proteins, and single-cell data. This survey provides a systematic review of recent advancements, focusing on genomic sequence modeling, RNA structure prediction, protein function inference, and single-cell transcriptomics. Meanwhile, we also discuss several key challenges, including data scarcity, computational complexity, and cross-omics integration, and explore future directions such as multimodal learning, hybrid AI models, and clinical applications. By offering a comprehensive perspective, this paper underscores the transformative potential of LLMs in driving innovations in bioinformatics and precision medicine.