Goto

Collaborating Authors

 prompt-tuning


An Empirical Study Towards Prompt-Tuning for Graph Contrastive Pre-Training in Recommendations

Neural Information Processing Systems

Graph contrastive learning (GCL) has emerged as a potent technology for numerous graph learning tasks. It has been successfully applied to real-world recommender systems, where the contrastive loss and the downstream recommendation objectives are always combined to form the overall objective function. Such a strategy is inconsistent with the original GCL paradigm, where graph embeddings are pre-trained without involving downstream training objectives. In this paper, we innovatively propose a prompt-enhanced framework for GCL-based recommender systems, namely CPTPP, which can fully leverage the advantages of the original GCL protocol through prompt tuning. Specifically, we first summarise user profiles in graph recommender systems to automatically generate personalized user prompts. These prompts will then be combined with pre-trained user embeddings to conduct prompt-tuning in downstream tasks, thereby narrowing the distinct targets between pre-training and downstream tasks.


An Empirical Study Towards Prompt-Tuning for Graph Contrastive Pre-Training in Recommendations

Neural Information Processing Systems

Graph contrastive learning (GCL) has emerged as a potent technology for numerous graph learning tasks. It has been successfully applied to real-world recommender systems, where the contrastive loss and the downstream recommendation objectives are always combined to form the overall objective function. Such a strategy is inconsistent with the original GCL paradigm, where graph embeddings are pre-trained without involving downstream training objectives. In this paper, we innovatively propose a prompt-enhanced framework for GCL-based recommender systems, namely CPTPP, which can fully leverage the advantages of the original GCL protocol through prompt tuning. Specifically, we first summarise user profiles in graph recommender systems to automatically generate personalized user prompts.


Transducer Tuning: Efficient Model Adaptation for Software Tasks Using Code Property Graphs

Yusuf, Imam Nur Bani, Jiang, Lingxiao

arXiv.org Artificial Intelligence

Large language models have demonstrated promising performance across various software engineering tasks. While fine-tuning is a common practice to adapt these models for downstream tasks, it becomes challenging in resource-constrained environments due to increased memory requirements from growing trainable parameters in increasingly large language models. We introduce \approach, a technique to adapt large models for downstream code tasks using Code Property Graphs (CPGs). Our approach introduces a modular component called \transducer that enriches code embeddings with structural and dependency information from CPGs. The Transducer comprises two key components: Graph Vectorization Engine (GVE) and Attention-Based Fusion Layer (ABFL). GVE extracts CPGs from input source code and transforms them into graph feature vectors. ABFL then fuses those graphs feature vectors with initial code embeddings from a large language model. By optimizing these transducers for different downstream tasks, our approach enhances the models without the need to fine-tune them for specific tasks. We have evaluated \approach on three downstream tasks: code summarization, assert generation, and code translation. Our results demonstrate competitive performance compared to full parameter fine-tuning while reducing up to 99\% trainable parameters to save memory. \approach also remains competitive against other fine-tuning approaches (e.g., LoRA, Prompt-Tuning, Prefix-Tuning) while using only 1.5\%-80\% of their trainable parameters. Our findings show that integrating structural and dependency information through Transducer Tuning enables more efficient model adaptation, making it easier for users to adapt large models in resource-constrained settings.


MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba

Yoshimura, Masakazu, Hayashi, Teruaki, Maeda, Yota

arXiv.org Artificial Intelligence

An ecosystem of Transformer-based models has been established by building large models with extensive data. Parameter-efficient fine-tuning (PEFT) is a crucial technology for deploying these models to downstream tasks with minimal cost while achieving effective performance. Recently, Mamba, a State Space Model (SSM)-based model, has attracted attention as a potential alternative to Transformers. While many large-scale Mamba-based models have been proposed, efficiently adapting pre-trained Mamba-based models to downstream tasks remains unexplored. In this paper, we conduct an exploratory analysis of PEFT methods for Mamba. We investigate the effectiveness of existing PEFT methods for Transformers when applied to Mamba. We also modify these methods to better align with the Mamba architecture. Additionally, we propose new Mamba-specific PEFT methods that leverage the distinctive structure of Mamba. Our experiments indicate that PEFT performs more effectively for Mamba than Transformers. Lastly, we demonstrate how to effectively combine multiple PEFT methods and provide a framework that outperforms previous works. To ensure reproducibility, we will release the code after publication.


ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks

Ma, Bolei, Nie, Ercong, Yuan, Shuzhou, Schmid, Helmut, Färber, Michael, Kreuter, Frauke, Schütze, Hinrich

arXiv.org Artificial Intelligence

Prompt-based methods have been successfully applied to multilingual pretrained language models for zero-shot cross-lingual understanding. However, most previous studies primarily focused on sentence-level classification tasks, and only a few considered token-level labeling tasks such as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging. In this paper, we propose Token-Level Prompt Decomposition (ToPro), which facilitates the prompt-based method for token-level sequence labeling tasks. The ToPro method decomposes an input sentence into single tokens and applies one prompt template to each token. Our experiments on multilingual NER and POS tagging datasets demonstrate that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning in zero-shot cross-lingual transfer, especially for languages that are typologically different from the source language English. Our method also attains state-of-the-art performance when employed with the mT5 model. Besides, our exploratory study in multilingual large language models shows that ToPro performs much better than the current in-context learning method. Overall, the performance improvements show that ToPro could potentially serve as a novel and simple benchmarking method for sequence labeling tasks.


PEFTT: Parameter-Efficient Fine-Tuning for low-resource Tibetan pre-trained language models

Mingjun, Zhou, Zhuoma, Daiqing, Nuo, Qun, Tashi, Nyima

arXiv.org Artificial Intelligence

In this era of large language models (LLMs), the traditional training of models has become increasingly unimaginable for regular users and institutions. The exploration of efficient fine-tuning for high-resource languages on these models is an undeniable trend that is gradually gaining popularity. However, there has been very little exploration for various low-resource languages, such as Tibetan. Research in Tibetan NLP is inherently scarce and limited. While there is currently no existing large language model for Tibetan due to its low-resource nature, that day will undoubtedly arrive. Therefore, research on efficient fine-tuning for low-resource language models like Tibetan is highly necessary. Our research can serve as a reference to fill this crucial gap. Efficient fine-tuning strategies for pre-trained language models (PLMs) in Tibetan have seen minimal exploration. We conducted three types of efficient fine-tuning experiments on the publicly available TNCC-title dataset: "prompt-tuning," "Adapter lightweight fine-tuning," and "prompt-tuning + Adapter fine-tuning." The experimental results demonstrate significant improvements using these methods, providing valuable insights for advancing Tibetan language applications in the context of pre-trained models.


On the Role of Attention in Prompt-tuning

Oymak, Samet, Rawat, Ankit Singh, Soltanolkotabi, Mahdi, Thrampoulidis, Christos

arXiv.org Artificial Intelligence

Recently, one of the key techniques that has helped pave the way for the Prompt-tuning is an emerging strategy to adapt deployment of transformers to ever increasing application large language models (LLM) to downstream areas is their ability to adapt to multiple unseen tasks by tasks by learning a (soft-)prompt parameter from conditioning their predictions through their inputs - a technique data. Despite its success in LLMs, there is limited known as prompt-tuning (Lester et al., 2021; Li & theoretical understanding of the power of Liang, 2021). Concretely, prompt-tuning provides a more prompt-tuning and the role of the attention mechanism efficient (cheaper/faster) alternative to fine-tuning the entire in prompting. In this work, we explore weights of the transformer by instead training (fewer) prompt-tuning for one-layer attention architectures so-called prompt parameters that are appended on the input and study contextual mixture-models where and can be thought of as an input interface. In fact, several each input token belongs to a context-relevant recent works have demonstrated experimentally that prompttuning or -irrelevant set. We isolate the role of prompttuning is not only more efficient, but often even becomes through a self-contained prompt-attention competitive to fine-tuning in terms of accuracy (Lester et al., model. Our contributions are as follows: (1) We 2021; Liu et al., 2023). However, there is currently limited show that softmax-prompt-attention is provably formal justification of such observations. This motivates the more expressive than softmax-self-attention and first question of this paper: linear-prompt-attention under our contextual data How does prompt-tuning compare to fine-tuning in terms of model.


XPrompt: Exploring the Extreme of Prompt Tuning

Ma, Fang, Zhang, Chen, Ren, Lei, Wang, Jingang, Wang, Qifan, Wu, Wei, Quan, Xiaojun, Song, Dawei

arXiv.org Artificial Intelligence

Prompt tuning learns soft prompts to condition frozen Pre-trained Language Models (PLMs) for performing downstream tasks in a parameter-efficient manner. While prompt tuning has gradually reached the performance level of fine-tuning as the model scale increases, there is still a large performance gap between prompt tuning and fine-tuning for models of moderate and small scales (typically less than 11B parameters). In this paper, we empirically show that the trained prompt tokens can have a negative impact on a downstream task and thus degrade its performance. To bridge the gap, we propose a novel Prompt tuning model with an eXtremely small scale (XPrompt) under the regime of lottery tickets hypothesis. Specifically, XPrompt eliminates the negative prompt tokens at different granularity levels through a hierarchical structured pruning, yielding a more parameter-efficient prompt yet with a competitive performance. Comprehensive experiments are carried out on SuperGLUE tasks, and the extensive results indicate that XPrompt is able to close the performance gap at smaller model scales.