Wang, Ziyu
TransECG: Leveraging Transformers for Explainable ECG Re-identification Risk Analysis
Wang, Ziyu, Khatibi, Elahe, Kazemi, Kianoosh, Azimi, Iman, Mousavi, Sanaz, Malik, Shaista, Rahmani, Amir M.
Electrocardiogram (ECG) signals are widely shared across multiple clinical applications for diagnosis, health monitoring, and biometric authentication. While valuable for healthcare, they also carry unique biometric identifiers that pose privacy risks, especially when ECG data shared across multiple entities. These risks are amplified in shared environments, where re-identification threats can compromise patient privacy. Existing deep learning re-identification models prioritize accuracy but lack explainability, making it challenging to understand how the unique biometric characteristics encoded within ECG signals are recognized and utilized for identification. Without these insights, despite high accuracy, developing secure and trustable ECG data-sharing frameworks remains difficult, especially in diverse, multi-source environments. In this work, we introduce TransECG, a Vision Transformer (ViT)-based method that uses attention mechanisms to pinpoint critical ECG segments associated with re-identification tasks like gender, age, and participant ID. Our approach demonstrates high accuracy (89.9% for gender, 89.9% for age, and 88.6% for ID re-identification) across four real-world datasets with 87 participants. Importantly, we provide key insights into ECG components such as the R-wave, QRS complex, and P-Q interval in re-identification. For example, in the gender classification, the R wave contributed 58.29% to the model's attention, while in the age classification, the P-R interval contributed 46.29%. By combining high predictive performance with enhanced explainability, TransECG provides a robust solution for privacy-conscious ECG data sharing, supporting the development of secure and trusted healthcare data environment.
Enhancing Object Detection Accuracy in Underwater Sonar Images through Deep Learning-based Denoising
Wang, Ziyu, Xue, Tao, Wang, Yanbin, Li, Jingyuan, Zhang, Haibin, Xu, Zhiqiang, Xu, Gaofei
Xidian University, China Xidian University, China Jiangxi University of Science and Technology, China Institute of Deep-sea Science and Engineering, China Abstract --Sonar image object detection is crucial for underwater robotics and other applications. However, various types of noise in sonar images can affect the accuracy of object detection. Denoising, as a critical preprocessing step, aims to remove noise while retaining useful information to improve detection accuracy. Although deep learning-based denoising algorithms perform well on optical images, their application to underwater sonar images remains underexplored. This paper systematically evaluates the effectiveness of several deep learning-based denoising algorithms, originally designed for optical images, in the context of underwater sonar image object detection. We apply nine trained denoising models to images from five open-source sonar datasets, each processing different types of noise. We then test the denoised images using four object detection algorithms. The results show that different denoising models have varying effects on detection performance. By combining the strengths of multiple denoising models, the detection results can be optimized, thus more effectively suppressing noise. Additionally, we adopt a multi-frame denoising technique, using different outputs generated by multiple denoising models as multiple frames of the same scene for further processing to enhance detection accuracy. This method, originally designed for optical images, leverages complementary noise-reduction effects. Experimental results show that denoised sonar images improve the performance of object detection algorithms compared to the original sonar images. I NTRODUCTION Underwater sonar imaging plays an indispensable role in marine exploration and various ocean industries, providing valuable insights into underwater environments. Unlike optical imaging, where light propagation is restricted, sonar systems utilize sound waves that travel farther, allowing them to cover larger underwater areas. This makes sonar images an ideal choice for applications such as seabed mapping, underwater object detection, and navigation. However, despite the advantages of sonar imaging, its image quality is often severely compromised by noise, which negatively impacts the accuracy of downstream tasks, such as object detection. In sonar images, noise can originate from various factors, including environmental interference, sensor imperfections, and the inherent characteristics of sound wave propagation Corresponding authors: Tao Xue, Y anbin Wang. in water. Common types of sonar image noise include Gaussian noise, speckle noise, and Poisson noise. Gaussian noise typically arises from random fluctuations in sensor readings or environmental changes. Speckle noise, caused by sound wave scattering, manifests as granular interference, which can obscure object boundaries.
FSMP: A Frontier-Sampling-Mixed Planner for Fast Autonomous Exploration of Complex and Large 3-D Environments
Zhang, Shiyong, Zhang, Xuebo, Dong, Qianli, Wang, Ziyu, Xi, Haobo, Yuan, Jing
In this paper, we propose a systematic framework for fast exploration of complex and large 3-D environments using micro aerial vehicles (MAVs). The key insight is the organic integration of the frontier-based and sampling-based strategies that can achieve rapid global exploration of the environment. Specifically, a field-of-view-based (FOV) frontier detector with the guarantee of completeness and soundness is devised for identifying 3-D map frontiers. Different from random sampling-based methods, the deterministic sampling technique is employed to build and maintain an incremental road map based on the recorded sensor FOVs and newly detected frontiers. With the resulting road map, we propose a two-stage path planner. First, it quickly computes the global optimal exploration path on the road map using the lazy evaluation strategy. Then, the best exploration path is smoothed for further improving the exploration efficiency. We validate the proposed method both in simulation and real-world experiments. The comparative results demonstrate the promising performance of our planner in terms of exploration efficiency, computational time, and explored volume.
Skewed Memorization in Large Language Models: Quantification and Decomposition
Li, Hao, Huang, Di, Wang, Ziyu, Rahmani, Amir M.
Memorization in Large Language Models (LLMs) poses privacy and security risks, as models may unintentionally reproduce sensitive or copyrighted data. Existing analyses focus on average-case scenarios, often neglecting the highly skewed distribution of memorization. This paper examines memorization in LLM supervised fine-tuning (SFT), exploring its relationships with training duration, dataset size, and inter-sample similarity. By analyzing memorization probabilities over sequence lengths, we link this skewness to the token generation process, offering insights for estimating memorization and comparing it to established metrics. Through theoretical analysis and empirical evaluation, we provide a comprehensive understanding of memorization behaviors and propose strategies to detect and mitigate risks, contributing to more privacy-preserving LLMs.
Real-World Offline Reinforcement Learning from Vision Language Model Feedback
Venkataraman, Sreyas, Wang, Yufei, Wang, Ziyu, Erickson, Zackory, Held, David
Offline reinforcement learning can enable policy learning from pre-collected, sub-optimal datasets without online interactions. This makes it ideal for real-world robots and safety-critical scenarios, where collecting online data or expert demonstrations is slow, costly, and risky. However, most existing offline RL works assume the dataset is already labeled with the task rewards, a process that often requires significant human effort, especially when ground-truth states are hard to ascertain (e.g., in the real-world). In this paper, we build on prior work, specifically RL-VLM-F, and propose a novel system that automatically generates reward labels for offline datasets using preference feedback from a vision-language model and a text description of the task. Our method then learns a policy using offline RL with the reward-labeled dataset. We demonstrate the system's applicability to a complex real-world robot-assisted dressing task, where we first learn a reward function using a vision-language model on a sub-optimal offline dataset, and then we use the learned reward to employ Implicit Q learning to develop an effective dressing policy. Our method also performs well in simulation tasks involving the manipulation of rigid and deformable objects, and significantly outperform baselines such as behavior cloning and inverse RL. In summary, we propose a new system that enables automatic reward labeling and policy learning from unlabeled, sub-optimal offline datasets.
HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations
Wang, Ziyu, Li, Hao, Huang, Di, Rahmani, Amir M.
In digital healthcare, large language models (LLMs) have primarily been utilized to enhance question-answering capabilities and improve patient interactions. However, effective patient care necessitates LLM chains that can actively gather information by posing relevant questions. This paper presents HealthQ, a novel framework designed to evaluate the questioning capabilities of LLM healthcare chains. We implemented several LLM chains, including Retrieval-Augmented Generation (RAG), Chain of Thought (CoT), and reflective chains, and introduced an LLM judge to assess the relevance and informativeness of the generated questions. To validate HealthQ, we employed traditional Natural Language Processing (NLP) metrics such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and Named Entity Recognition (NER)-based set comparison, and constructed two custom datasets from public medical note datasets, ChatDoctor and MTS-Dialog. Our contributions are threefold: we provide the first comprehensive study on the questioning capabilities of LLMs in healthcare conversations, develop a novel dataset generation pipeline, and propose a detailed evaluation methodology.
Enhancing Performance and User Engagement in Everyday Stress Monitoring: A Context-Aware Active Reinforcement Learning Approach
Aqajari, Seyed Amir Hossein, Wang, Ziyu, Tazarv, Ali, Labbaf, Sina, Jafarlou, Salar, Nguyen, Brenda, Dutt, Nikil, Levorato, Marco, Rahmani, Amir M.
In today's fast-paced world, accurately monitoring stress levels is crucial. Sensor-based stress monitoring systems often need large datasets for training effective models. However, individual-specific models are necessary for personalized and interactive scenarios. Traditional methods like Ecological Momentary Assessments (EMAs) assess stress but struggle with efficient data collection without burdening users. The challenge is to timely send EMAs, especially during stress, balancing monitoring efficiency and user convenience. This paper introduces a novel context-aware active reinforcement learning (RL) algorithm for enhanced stress detection using Photoplethysmography (PPG) data from smartwatches and contextual data from smartphones. Our approach dynamically selects optimal times for deploying EMAs, utilizing the user's immediate context to maximize label accuracy and minimize intrusiveness. Initially, the study was executed in an offline environment to refine the label collection process, aiming to increase accuracy while reducing user burden. Later, we integrated a real-time label collection mechanism, transitioning to an online methodology. This shift resulted in an 11% improvement in stress detection efficiency. Incorporating contextual data improved model accuracy by 4%. Personalization studies indicated a 10% enhancement in AUC-ROC scores, demonstrating better stress level differentiation. This research marks a significant move towards personalized, context-driven real-time stress monitoring methods.
Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints
Wu, Yuxuan, Wang, Ziyu, Raj, Bhiksha, Xia, Gus
We contribute an unsupervised method that effectively learns from raw observation and disentangles its latent space into content and style representations. Unlike most disentanglement algorithms that rely on domain-specific labels and knowledge, our method is based on the insight of domain-general statistical differences between content and style -- content varies more among different fragments within a sample but maintains an invariant vocabulary across data samples, whereas style remains relatively invariant within a sample but exhibits more significant variation across different samples. We integrate such inductive bias into an encoder-decoder architecture and name our method after V3 (variance-versus-invariance). Experimental results show that V3 generalizes across two distinct domains in different modalities, music audio and images of written digits, successfully learning pitch-timbre and digit-color disentanglements, respectively. Also, the disentanglement robustness significantly outperforms baseline unsupervised methods and is even comparable to supervised counterparts. Furthermore, symbolic-level interpretability emerges in the learned codebook of content, forging a near one-to-one alignment between machine representation and human knowledge.
On Subjective Uncertainty Quantification and Calibration in Natural Language Generation
Wang, Ziyu, Holmes, Chris
An example of this is question answering (QA): given a question from the user, the model may provide a brief answer, but it may also follow with supporting facts and explanations, which can vary in form and detail. The user can be satisfied by a wide variety of responses, irrespective of their style or (to some extent) the choice of supporting facts included. Free-form NLG poses significant challenges to uncertainty quantification: some aspects of generation are irrelevant to the task's purpose and best excluded from uncertainty quantification, but it often appears that we are unable to characterize them precisely. If left unaddressed, however, the model's variation in the irrelevant aspects may dominate in standard uncertainty measures such as token-level entropy (Kuhn et al., 2023), making them uninformative about the model's actual performance on the task. Starting from Kuhn et al. (2023), a recent line of work (Kuhn et al., 2023; Lin et al., 2024; Zhang et al., 2023; Aichberger et al., 2024) studied this issue and proposed measuring the "semantic uncertainty" of generation; "semantics" is defined as the equivalence class of textual responses that logically entail one another. Empirical improvements in downstream tasks evidenced their contributions and highlighted the importance of task-specific uncertainty quantification, but important conceptual and practical issues remain. From a practical perspective, semantic equivalence is estimated using machine learning models, resulting in imprecise estimates that do not necessarily define an equivalence relation.
Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective
Falck, Fabian, Wang, Ziyu, Holmes, Chris
In-context learning (ICL) has emerged as a particularly remarkable characteristic of Large Language Models (LLM): given a pretrained LLM and an observed dataset, LLMs can make predictions for new data points from the same distribution without fine-tuning. Numerous works have postulated ICL as approximately Bayesian inference, rendering this a natural hypothesis. In this work, we analyse this hypothesis from a new angle through the martingale property, a fundamental requirement of a Bayesian learning system for exchangeable data. We show that the martingale property is a necessary condition for unambiguous predictions in such scenarios, and enables a principled, decomposed notion of uncertainty vital in trustworthy, safety-critical systems. We derive actionable checks with corresponding theory and test statistics which must hold if the martingale property is satisfied. We also examine if uncertainty in LLMs decreases as expected in Bayesian learning when more data is observed. In three experiments, we provide evidence for violations of the martingale property, and deviations from a Bayesian scaling behaviour of uncertainty, falsifying the hypothesis that ICL is Bayesian.