Goto

Collaborating Authors

 Meng, Yan


Model Inversion in Split Learning for Personalized LLMs: New Insights from Information Bottleneck Theory

arXiv.org Artificial Intelligence

Personalized Large Language Models (LLMs) have become increasingly prevalent, showcasing the impressive capabilities of models like GPT-4. This trend has also catalyzed extensive research on deploying LLMs on mobile devices. Feasible approaches for such edge-cloud deployment include using split learning. However, previous research has largely overlooked the privacy leakage associated with intermediate representations transmitted from devices to servers. This work is the first to identify model inversion attacks in the split learning framework for LLMs, emphasizing the necessity of secure defense. For the first time, we introduce mutual information entropy to understand the information propagation of Transformer-based LLMs and assess privacy attack performance for LLM blocks. To address the issue of representations being sparser and containing less information than embeddings, we propose a two-stage attack system in which the first part projects representations into the embedding space, and the second part uses a generative model to recover text from these embeddings. This design breaks down the complexity and achieves attack scores of 38%-75% in various scenarios, with an over 60% improvement over the SOTA. This work comprehensively highlights the potential privacy risks during the deployment of personalized LLMs on the edge side.


How to Learn in a Noisy World? Self-Correcting the Real-World Data Noise on Machine Translation

arXiv.org Artificial Intelligence

The massive amounts of web-mined parallel data contain large amounts of noise. Semantic misalignment, as the primary source of the noise, poses a challenge for training machine translation systems. In this paper, we first study the impact of real-world hard-to-detect misalignment noise by proposing a process to simulate the realistic misalignment controlled by semantic similarity. After quantitatively analyzing the impact of simulated misalignment on machine translation, we show the limited effectiveness of widely used pre-filters to improve the translation performance, underscoring the necessity of more fine-grained ways to handle data noise. By observing the increasing reliability of the model's self-knowledge for distinguishing misaligned and clean data at the token-level, we propose a self-correction approach which leverages the model's prediction distribution to revise the training supervision from the ground-truth data over training time. Through comprehensive experiments, we show that our self-correction method not only improves translation performance in the presence of simulated misalignment noise but also proves effective for real-world noisy web-mined datasets across eight translation tasks.


Disentangling the Roles of Target-Side Transfer and Regularization in Multilingual Machine Translation

arXiv.org Artificial Intelligence

Multilingual Machine Translation (MMT) benefits from knowledge transfer across different language pairs. However, improvements in one-to-many translation compared to many-to-one translation are only marginal and sometimes even negligible. This performance discrepancy raises the question of to what extent positive transfer plays a role on the target-side for one-to-many MT. In this paper, we conduct a large-scale study that varies the auxiliary target side languages along two dimensions, i.e., linguistic similarity and corpus size, to show the dynamic impact of knowledge transfer on the main language pairs. We show that linguistically similar auxiliary target languages exhibit strong ability to transfer positive knowledge. With an increasing size of similar target languages, the positive transfer is further enhanced to benefit the main language pairs. Meanwhile, we find distant auxiliary target languages can also unexpectedly benefit main language pairs, even with minimal positive transfer ability. Apart from transfer, we show distant auxiliary target languages can act as a regularizer to benefit translation performance by enhancing the generalization and model inference calibration.


How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual Translation via Tiny Multi-Parallel Data

arXiv.org Artificial Intelligence

Zero-shot translation is an open problem, aiming to translate between language pairs unseen during training in Multilingual Machine Translation (MMT). A common, albeit resource-consuming, solution is to mine as many translation directions as possible to add to the parallel corpus. In this paper, we show that the zero-shot capability of an English-centric model can be easily enhanced by fine-tuning with a very small amount of multi-parallel data. For example, on the EC30 dataset, we show that up to +21.7 ChrF non-English overall improvements (870 directions) can be achieved by using only 100 multi-parallel samples, meanwhile preserving capability in English-centric directions. We further study the size effect of fine-tuning data and its transfer capabilities. Surprisingly, our empirical analysis shows that comparable overall improvements can be achieved even through fine-tuning in a small, randomly sampled direction set (10\%). Also, the resulting non-English performance is quite close to the upper bound (complete translation). Due to its high efficiency and practicality, we encourage the community 1) to consider the use of the fine-tuning method as a strong baseline for zero-shot translation and 2) to construct more comprehensive and high-quality multi-parallel data to cover real-world demand.


FOLLOWUPQG: Towards Information-Seeking Follow-up Question Generation

arXiv.org Artificial Intelligence

Humans ask follow-up questions driven by curiosity, which reflects a creative human cognitive process. We introduce the task of real-world information-seeking follow-up question generation (FQG), which aims to generate follow-up questions seeking a more in-depth understanding of an initial question and answer. We construct FOLLOWUPQG, a dataset of over 3K real-world (initial question, answer, follow-up question) tuples collected from a Reddit forum providing layman-friendly explanations for open-ended questions. In contrast to existing datasets, questions in FOLLOWUPQG use more diverse pragmatic strategies to seek information, and they also show higher-order cognitive skills (such as applying and relating). We evaluate current question generation models on their efficacy for generating follow-up questions, exploring how to generate specific types of follow-up questions based on step-by-step demonstrations. Our results validate FOLLOWUPQG as a challenging benchmark, as model-generated questions are adequate but far from human-raised questions in terms of informativeness and complexity.


Parameter-Efficient Fine-Tuning without Introducing New Latency

arXiv.org Artificial Intelligence

Parameter-efficient fine-tuning (PEFT) of pre-trained language models has recently demonstrated remarkable achievements, effectively matching the performance of full fine-tuning while utilizing significantly fewer trainable parameters, and consequently addressing the storage and communication constraints. Nonetheless, various PEFT methods are limited by their inherent characteristics. In the case of sparse fine-tuning, which involves modifying only a small subset of the existing parameters, the selection of fine-tuned parameters is task- and domain-specific, making it unsuitable for federated learning. On the other hand, PEFT methods with adding new parameters typically introduce additional inference latency. In this paper, we demonstrate the feasibility of generating a sparse mask in a task-agnostic manner, wherein all downstream tasks share a common mask. Our approach, which relies solely on the magnitude information of pre-trained parameters, surpasses existing methodologies by a significant margin when evaluated on the GLUE benchmark. Additionally, we introduce a novel adapter technique that directly applies the adapter to pre-trained parameters instead of the hidden representation, thereby achieving identical inference speed to that of full fine-tuning. Through extensive experiments, our proposed method attains a new state-of-the-art outcome in terms of both performance and storage efficiency, storing only 0.03% parameters of full fine-tuning.


Few-shot semantic segmentation via mask aggregation

arXiv.org Artificial Intelligence

Few-shot semantic segmentation aims to recognize novel classes with only very few labelled data. This challenging task requires mining of the relevant relationships between the query image and the support images. Previous works have typically regarded it as a pixel-wise classification problem. Therefore, various models have been designed to explore the correlation of pixels between the query image and the support images. However, they focus only on pixel-wise correspondence and ignore the overall correlation of objects. In this paper, we introduce a mask-based classification method for addressing this problem. The mask aggregation network (MANet), which is a simple mask classification model, is proposed to simultaneously generate a fixed number of masks and their probabilities of being targets. Then, the final segmentation result is obtained by aggregating all the masks according to their locations. Experiments on both the PASCAL-5^i and COCO-20^i datasets show that our method performs comparably to the state-of-the-art pixel-based methods. This competitive performance demonstrates the potential of mask classification as an alternative baseline method in few-shot semantic segmentation. Our source code will be made available at https://github.com/TinyAway/MANet.