He, Ke
WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection
Shetty, Anudeex, Teng, Yue, He, Ke, Xu, Qiongkai
Embedding as a Service (EaaS) has become a widely adopted solution, which offers feature extraction capabilities for addressing various downstream tasks in Natural Language Processing (NLP). Prior studies have shown that EaaS can be prone to model extraction attacks; nevertheless, this concern could be mitigated by adding backdoor watermarks to the text embeddings and subsequently verifying the attack models post-publication. Through the analysis of the recent watermarking strategy for EaaS, EmbMarker, we design a novel CSE (Clustering, Selection, Elimination) attack that removes the backdoor watermark while maintaining the high utility of embeddings, indicating that the previous watermarking approach can be breached. In response to this new threat, we propose a new protocol to make the removal of watermarks more challenging by incorporating multiple possible watermark directions. Our defense approach, WARDEN, notably increases the stealthiness of watermarks and has been empirically shown to be effective against CSE attack.
Learning based signal detection for MIMO systems with unknown noise statistics
He, Ke, He, Le, Fan, Lisheng, Deng, Yansha, Karagiannidis, George K., Nallanathan, Arumugam
This paper aims to devise a generalized maximum likelihood (ML) estimator to robustly detect signals with unknown noise statistics in multiple-input multiple-output (MIMO) systems. In practice, there is little or even no statistical knowledge on the system noise, which in many cases is non-Gaussian, impulsive and not analyzable. Existing detection methods have mainly focused on specific noise models, which are not robust enough with unknown noise statistics. To tackle this issue, we propose a novel ML detection framework to effectively recover the desired signal. Our framework is a fully probabilistic one that can efficiently approximate the unknown noise distribution through a normalizing flow. Importantly, this framework is driven by an unsupervised learning approach, where only the noise samples are required. To reduce the computational complexity, we further present a low-complexity version of the framework, by utilizing an initial estimation to reduce the search space. Simulation results show that our framework outperforms other existing algorithms in terms of bit error rate (BER) in non-analytical noise environments, while it can reach the ML performance bound in analytical noise environments. The code of this paper is available at https://github.com/skypitcher/manfe.