Overview
Signal-to-Noise Ratio in Scanning Electron Microscopy: A Comprehensive Review
Sim, K. S., Bukhori, I., Ong, D. C. Y., Gan, K. B.
Scanning Electron Microscopy (SEM) is critical in nanotechnology, materials science, and biological imaging due to its high spatial resolution and depth of focus. Signal-to-noise ratio (SNR) is an essential parameter in SEM because it directly impacts the quality and interpretability of the images. SEM is widely used in various scientific disciplines, but its utility can be compromised by noise, which degrades image clarity. This review explores multiple aspects of the SEM imaging process, from the principal operation of SEM, sources of noise in SEM, methods for SNR measurement and estimations, to various aspects that affect the SNR measurement and approaches to enhance SNR, both from a hardware and software standpoint. We review traditional and emerging techniques, focusing on their applications, advantages, and limitations. The paper aims to provide a comprehensive understanding of SNR optimization in SEM for researchers and practitioners and to encourage further research in the field.
Who Stole Your Data? A Method for Detecting Unauthorized RAG Theft
Liu, Peiyang, Cui, Ziqiang, Liang, Di, Ye, Wei
Retrieval-augmented generation (RAG) enhances Large Language Models (LLMs) by mitigating hallucinations and outdated information issues, yet simultaneously facilitates unauthorized data appropriation at scale. This paper addresses this challenge through two key contributions. First, we introduce RPD, a novel dataset specifically designed for RAG plagiarism detection that encompasses diverse professional domains and writing styles, overcoming limitations in existing resources. Second, we develop a dual-layered watermarking system that embeds protection at both semantic and lexical levels, complemented by an interrogator-detective framework that employs statistical hypothesis testing on accumulated evidence. Extensive experimentation demonstrates our approach's effectiveness across varying query volumes, defense prompts, and retrieval parameters, while maintaining resilience against adversarial evasion techniques. This work establishes a foundational framework for intellectual property protection in retrieval-augmented AI systems.
Large Language Models Meet Virtual Cell: A Survey
Li, Krinos, Xiao, Xianglu, Deng, Shenglong, He, Lucas, Zhong, Zijun, Zou, Yuanjie, Zhan, Zhonghao, Hui, Zheng, Bao, Weiye, Yang, Guang
Large language models (LLMs) are transforming cellular biology by enabling the development of "virtual cells"--computational systems that represent, predict, and reason about cellular states and behaviors. This work provides a comprehensive review of LLMs for virtual cell modeling. We propose a unified taxonomy that organizes existing methods into two paradigms: LLMs as Oracles, for direct cellular modeling, and LLMs as Agents, for orchestrating complex scientific tasks. We identify three core tasks--cellular representation, perturbation prediction, and gene regulation inference--and review their associated models, datasets, evaluation benchmarks, as well as the critical challenges in scalability, generalizability, and interpretability.
Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs
Hu, Man, Wu, Xinyi, Suo, Zuofeng, Feng, Jinbo, Meng, Linghui, Jia, Yanhao, Luu, Anh Tuan, Zhao, Shuai
With the rise of advanced reasoning capabilities, large language models (LLMs) are receiving increasing attention. However, although reasoning improves LLMs' performance on downstream tasks, it also introduces new security risks, as adversaries can exploit these capabilities to conduct backdoor attacks. Existing surveys on backdoor attacks and reasoning security offer comprehensive overviews but lack in-depth analysis of backdoor attacks and defenses targeting LLMs' reasoning abilities. In this paper, we take the first step toward providing a comprehensive review of reasoning-based backdoor attacks in LLMs by analyzing their underlying mechanisms, methodological frameworks, and unresolved challenges. Specifically, we introduce a new taxonomy that offers a unified perspective for summarizing existing approaches, categorizing reasoning-based backdoor attacks into associative, passive, and active. We also present defense strategies against such attacks and discuss current challenges alongside potential directions for future research. This work offers a novel perspective, paving the way for further exploration of secure and trustworthy LLM communities.
Accuracy, Memory Efficiency and Generalization: A Comparative Study on Liquid Neural Networks and Recurrent Neural Networks
Zong, Shilong, Bierly, Alex, Boker, Almuatazbellah, Eldardiry, Hoda
This review aims to conduct a comparative analysis of liquid neural networks (LNNs) and traditional recurrent neural networks (RNNs) and their variants, such as long short-term memory networks (LSTMs) and gated recurrent units (GRUs). The core dimensions of the analysis include model accuracy, memory efficiency, and generalization ability. By systematically reviewing existing research, this paper explores the basic principles, mathematical models, key characteristics, and inherent challenges of these neural network architectures in processing sequential data. Research findings reveal that LNN, as an emerging, biologically inspired, continuous-time dynamic neural network, demonstrates significant potential in handling noisy, non-stationary data, and achieving out-of-distribution (OOD) generalization. Additionally, some LNN variants outperform traditional RNN in terms of parameter efficiency and computational speed. However, RNN remains a cornerstone in sequence modeling due to its mature ecosystem and successful applications across various tasks. This review identifies the commonalities and differences between LNNs and RNNs, summarizes their respective shortcomings and challenges, and points out valuable directions for future research, particularly emphasizing the importance of improving the scalability of LNNs to promote their application in broader and more complex scenarios.
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Jeong, Soyeong, Jung, Taehee, Hwang, Sung Ju, Kim, Joo-Kyung, Kang, Dongyeop
Recent Long-Context Language Models (LCLMs) can process hundreds of thousands of tokens in a single prompt, enabling new opportunities for knowledge-intensive multi-hop reasoning by integrating large sets of retrieved documents or, in some cases, directly all necessary information. However, simply feeding more documents into the context window fails to capture how evidence should be connected. We address this gap with thought templates, which recast reasoning as reusable thought caches, derived from prior problem solving traces, structuring how evidence is combined and guiding multi-hop inference with factual documents. To keep these templates effective, we propose an update strategy that iteratively refines templates derived from training data through natural-language feedback. Across diverse benchmarks and LCLM families, our approach delivers consistent gains over strong baselines in both retrieval-based and retrieval-free settings. Furthermore, we show that optimized templates can be distilled into smaller open-source models, demonstrating its broad applicability and transparent reasoning reuse. We refer to our framework as Thought Template Augmented LCLMs (ToTAL).
Large Language Models Hallucination: A Comprehensive Survey
Alansari, Aisha, Luqman, Hamzah
Large language models (LLMs) have transformed natural language processing, achieving remarkable performance across diverse tasks. However, their impressive fluency often comes at the cost of producing false or fabricated information, a phenomenon known as hallucination. Hallucination refers to the generation of content by an LLM that is fluent and syntactically correct but factually inaccurate or unsupported by external evidence. Hallucinations undermine the reliability and trustworthiness of LLMs, especially in domains requiring factual accuracy. This survey provides a comprehensive review of research on hallucination in LLMs, with a focus on causes, detection, and mitigation. We first present a taxonomy of hallucination types and analyze their root causes across the entire LLM development lifecycle, from data collection and architecture design to inference. We further examine how hallucinations emerge in key natural language generation tasks. Building on this foundation, we introduce a structured taxonomy of detection approaches and another taxonomy of mitigation strategies. We also analyze the strengths and limitations of current detection and mitigation approaches and review existing evaluation benchmarks and metrics used to quantify LLMs hallucinations. Finally, we outline key open challenges and promising directions for future research, providing a foundation for the development of more truthful and trustworthy LLMs.
Tuning the Tuner: Introducing Hyperparameter Optimization for Auto-Tuning
Willemsen, Floris-Jan, van Nieuwpoort, Rob V., van Werkhoven, Ben
Abstract--Automatic performance tuning (auto-tuning) is widely used to optimize performance-critical applications across many scientific domains by finding the best program variant among many choices. Efficient optimization algorithms are crucial for navigating the vast and complex search spaces in auto-tuning. As is well known in the context of machine learning and similar fields, hyperparameters critically shape optimization algorithm efficiency. Y et for auto-tuning frameworks, these hyperparameters are almost never tuned, and their potential performance impact has not been studied. We present a novel method for general hyperparameter tuning of optimization algorithms for auto-tuning, thus "tuning the tuner". In particular, we propose a robust statistical method for evaluating hyperparameter performance across search spaces, publish a F AIR data set and software for reproducibility, and present a simulation mode that replays previously recorded tuning data, lowering the costs of hyperparameter tuning by two orders of magnitude. We show that even limited hyperparam-eter tuning can improve auto-tuner performance by 94.8% on average, and establish that the hyperparameters themselves can be optimized efficiently with meta-strategies (with an average improvement of 204.7%), demonstrating the often overlooked hyperparameter tuning as a powerful technique for advancing auto-tuning research and practice. UTOMA TIC performance tuning, or auto-tuning, is a widely established method for optimizing the performance of applications in many scientific domains, including radio astronomy [1]-[4], image processing [5]-[7], fluid dynamics [8]-[10], and climate modeling [11]-[13]. Auto-tuning automates the process of exploring the myriad of implementation choices that arise in performance optimization, such as the number of threads, tile sizes used in loop blocking, and other code optimization parameters [14]. At the heart of the auto-tuning method is a search space of functionally-equivalent code variants that is explored by an optimization algorithm.
A Survey of Reinforcement Learning for Large Reasoning Models
Zhang, Kaiyan, Zuo, Yuxin, He, Bingxiang, Sun, Youbang, Liu, Runze, Jiang, Che, Fan, Yuchen, Tian, Kai, Jia, Guoli, Li, Pengfei, Fu, Yu, Lv, Xingtai, Zhang, Yuchen, Zeng, Sihang, Qu, Shang, Li, Haozhan, Wang, Shijie, Wang, Yuru, Long, Xinwei, Liu, Fangfu, Xu, Xiang, Ma, Jiaze, Zhu, Xuekai, Hua, Ermo, Liu, Yihao, Li, Zonglin, Chen, Huayu, Qu, Xiaoye, Li, Yafu, Chen, Weize, Yuan, Zhenzhao, Gao, Junqi, Li, Dong, Ma, Zhiyuan, Cui, Ganqu, Liu, Zhiyuan, Qi, Biqing, Ding, Ning, Zhou, Bowen
In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progress of the field, further scaling of RL for LRMs now faces foundational challenges not only in computational resources but also in algorithm design, training data, and infrastructure. To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area. We hope this review will promote future research on RL for broader reasoning models. Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs
Towards Network Data Analytics in 5G Systems and Beyond
Romero, Marcos Lima, Suyama, Ricardo
Data has become a critical asset in the digital economy, yet it remains underutilized by Mobile Network Operators (MNOs), unlike Over-the-Top (OTT) players that lead global market valuations. To move beyond the commoditization of connectivity and deliver greater value to customers, data analytics emerges as a strategic enabler. Using data efficiently is essential for unlocking new service opportunities, optimizing operational efficiency, and mitigating operational and business risks. Since Release 15, the 3rd Generation Partnership Project (3GPP) has introduced the Network Data Analytics Function (NWDAF) to provide powerful insights and predictions using data collected across mobile networks, supporting both user-centric and network-oriented use cases. However, academic research has largely focused on a limited set of methods and use cases, driven by the availability of datasets, restricting broader exploration. This study analyzes trends and gaps in more than 70 articles and proposes two novel use cases to promote the adoption of NWDAF and explore its potential for monetization.