Goto

Collaborating Authors

 Li, Weiyuan


Enhancing Persona Consistency for LLMs' Role-Playing using Persona-Aware Contrastive Learning

arXiv.org Artificial Intelligence

In recent years, large language models (LLMs) have achieved breakthrough progress in many dialogue generation tasks. However, their lack of emotion and fine-grained role awareness limits the model's ability to provide personalized and diverse interactions further. Current methods face high costs in collecting high-quality annotated data for scenarios such as role-playing, and traditional human alignment methods are difficult to deploy due to the inherent diversity of model behavior in role-playing scenarios. Inspired by the alignment of models for safety behaviors through RLHF (Reinforcement Learning from Human Feedback), in this paper, we revisit model role-playing behavior from the perspective of persona alignment and propose a novel annotation-free framework named \textbf{\underline{P}}ersona-Aware \textbf{\underline{C}}ontrastive \textbf{\underline{L}}earning (PCL) to align LLMs' behavior during role-playing, enhancing the model's role consistency. Specifically, we first design a role chain method to encourage the model to self-question based on the role characteristics and dialogue context to adjust personality consistency. Then, we further enhance the model's role-playing strategy through iterative contrastive learning between the use of role characteristics and not. Experiments on both black-box and white-box LLMs show that LLMs equipped with PCL significantly outperform vanilla LLMs under automatic evaluation methods (CharEval \& GPT-4) and human expert evaluation.


SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback

arXiv.org Artificial Intelligence

RAG systems consist of multiple modules to work together. However, these modules are usually separately trained. We argue that a system like RAG that incorporates multiple modules should be jointly optimized to achieve optimal performance. To demonstrate this, we design a specific pipeline called SmartRAG that includes a policy network and a retriever. The policy network can serve as 1) a decision maker that decides when to retrieve, 2) a query rewriter to generate a query most suited to the retriever, and 3) an answer generator that produces the final response with/without the observations. We then propose to jointly optimize the whole system using a reinforcement learning algorithm, with the reward designed to encourage the system to achieve the best performance with minimal retrieval cost. When jointly optimized, all the modules can be aware of how other modules are working and thus find the best way to work together as a complete system. Empirical results demonstrate that the jointly optimized SmartRAG can achieve better performance than separately optimized counterparts. Although large language models(LLMs) (Chowdhery et al., 2023; Touvron et al., 2023; Chung et al., 2024) have demonstrated exceptional capabilities across various domains, addressing knowledgerelated issues beyond model parameters remains a challenging task (Mallen et al., 2023b; Min et al., 2023). Retrieval-augmentation generation(RAG) effectively enhances model performance in these scenarios by retrieving additional information from external tools (Ram et al., 2023). RAG systems usually consist of multiple modules including at least a retriever and a generator. Some systems may have other modules like a reranker (Glass et al., 2022), a decision maker deciding when to retrieve (Jeong et al., 2024; Wang et al., 2023a), a query rewriter (Ma et al., 2023; Tan et al., 2024) or a verifier (Lewis et al., 2020; Izacard et al., 2023). These modules are often hand-designed and separately optimized. One of the issues is that the golden answer of the intermediate modules are usually not accessible. What is worse, sometimes the golden answer is model-dependent or retriever-dependent. For example, Asai et al. (2024) uses the result of GPT4 (Achiam et al., 2023) as the ground truth for the decision maker, which can be suboptimal.


Controlling Character Motions without Observable Driving Source

arXiv.org Artificial Intelligence

How to generate diverse, life-like, and unlimited long head/body sequences without any driving source? We argue that this under-investigated research problem is non-trivial at all, and has unique technical challenges behind it. Without semantic constraints from the driving sources, using the standard autoregressive model to generate infinitely long sequences would easily result in 1) out-of-distribution (OOD) issue due to the accumulated error, 2) insufficient diversity to produce natural and life-like motion sequences and 3) undesired periodic patterns along the time. To tackle the above challenges, we propose a systematic framework that marries the benefits of VQ-VAE and a novel token-level control policy trained with reinforcement learning using carefully designed reward functions. A high-level prior model can be easily injected on top to generate unlimited long and diverse sequences. Although we focus on no driving sources now, our framework can be generalized for controlled synthesis with explicit driving sources. Through comprehensive evaluations, we conclude that our proposed framework can address all the above-mentioned challenges and outperform other strong baselines very significantly.