Goto

Collaborating Authors

 Chen, Yin


A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation

arXiv.org Artificial Intelligence

An increasing number of companies have begun providing services that leverage cloud-based large language models (LLMs), such as ChatGPT. However, this development raises substantial privacy concerns, as users' prompts are transmitted to and processed by the model providers. Among the various privacy protection methods for LLMs, those implemented during the pre-training and fine-tuning phrases fail to mitigate the privacy risks associated with the remote use of cloud-based LLMs by users. On the other hand, methods applied during the inference phrase are primarily effective in scenarios where the LLM's inference does not rely on privacy-sensitive information. In this paper, we outline the process of remote user interaction with LLMs and, for the first time, propose a detailed definition of a general pseudonymization framework applicable to cloud-based LLMs. The experimental results demonstrate that the proposed framework strikes an optimal balance between privacy protection and utility. The code for our method is available to the public at https://github.com/Mebymeby/Pseudonymization-Framework.


Fair Deep Learning Prediction for Healthcare Applications with Confounder Filtering

arXiv.org Machine Learning

The rapid development of deep learning methods has permitted the fast and accurate medical decision making from complex structured data, like CT images or MRI. However, some problems still exist in such applications that may lead to imperfect predictions. Previous observations have shown that, confounding factors, if handled inappropriately, will lead to biased prediction results towards some major properties of the data distribution. In other words, naively applying deep learning methods in these applications will lead to unfair prediction results for the minority group defined by the characteristics including age, gender, or even the hospital that collects the data, etc. In this paper, extending previous successes in correcting confounders, we propose a more stable method, namely Confounder Filtering, that can effectively reduce the influence of confounding factors, leading to better generalizability of trained discriminative deep neural networks, therefore, fairer prediction results. Our experimental results indicate that the Confounder Filtering method is able to improve the performance for different neural networks including CNN, LSTM, and other arbitrary architecture, different data types including CT-scan, MRI, and EEG brain wave data, as well as different confounding factors including age, gender, and physical factors of medical devices etc


Fused sparsity and robust estimation for linear models with unknown variance

Neural Information Processing Systems

In this paper, we develop a novel approach to the problem of learning sparse representations in the context of fused sparsity and unknown noise level. We propose an algorithm, termed Scaled Fused Dantzig Selector (SFDS), that accomplishes the aforementioned learning task by means of a second-order cone program. A special emphasize is put on the particular instance of fused sparsity corresponding to the learning in presence of outliers. We establish finite sample risk bounds and carry out an experimental evaluation on both synthetic and real data.


First-Order Indefinability of Answer Set Programs on Finite Structures

AAAI Conferences

An answer set program with variables is first-order definable on finite structures if the set of its finite answer sets can be captured by a first-order sentence, otherwise this program is first-order indefinable on finite structures. In this paper, we study the problem of first-order indefinability of answer set programs. We provide an Ehrenfeucht-Fraisse game-theoretic characterization for the first-order indefinability of answer set programs on finite structures. As an application of this approach, we show that the well-known finding Hamiltonian cycles program is not first-order definable on finite structures. We then define two notions named the 0-1 property and unbounded cycles or paths under the answer set semantics, from which we develop two sufficient conditions that may be effectively used in proving a program's first-order indefinability on finite structures under certain circumstances.