rar
Thinking in Character: Advancing Role-playing Agents with Role-Aware Reasoning
The advancement of Large Language Models (LLMs) has spurred significant interest in Role-Playing Agents (RPAs) for applications such as emotional companionship and virtual interaction. However, recent RPAs are often built on explicit dialogue data, lacking deep, human-like internal thought processes, resulting in superficial knowledge and style expression. While Large Reasoning Models (LRMs) can be employed to simulate character thought, their direct application is hindered by attention diversion (i.e., RPAs forget their role) and style drift (i.e., overly formal and rigid reasoning rather than character-consistent reasoning). To address these challenges, this paper introduces a novel Role-Aware Reasoning (RAR) method, which consists of two important stages: Role Identity Activation (RIA) and Reasoning Style Optimization (RSO). RIA explicitly guides the model with character profiles during reasoning to counteract attention diversion, and then RSO aligns reasoning style with the character and scene via LRM distillation to mitigate style drift. Extensive experiments demonstrate that the proposed RAR significantly enhances the performance of RPAs by effectively addressing attention diversion and style drift.
isanunbiasedstochasticgradientdescentupdateruleforthefollowingempiricalrisk: R(ฮธ) = X
This section contains the theoretical analysis of the loss functions of offline experience replay (Proposition 2),augmented experience replay (Proposition 3),andonline experience replay with reservoirsampling(Proposition1). For all experiments, we use the learning rate of 0.1 following the same setting as in Aljundi et al. [2019], Shimetal.[2021], This paper uses Randaugment [Cubuk et al., 2020], which is an auto augmentation method. It randomly selectsP augmentation operators from a set of 14 operators and applies them to the images. ToapplyBPGintheOCLenvironment,weproposeto determine the better/worse action set based on the feedback in the form of current memory batch accuracyAM,which reflects the memory overfitting level of the CL agent.
Retrospective Adversarial Replay for Continual Learning
Continual learning is an emerging research challenge in machine learning that addresses the problem where models quickly fit the most recently trained-on data but suffer from catastrophic forgetting of previous data due to distribution shifts --- it does this by maintaining a small historical replay buffer in replay-based methods. To avoid these problems, this paper proposes a method, ``Retrospective Adversarial Replay (RAR)'', that synthesizes adversarial samples near the forgetting boundary. RAR perturbs a buffered sample towards its nearest neighbor drawn from the current task in a latent representation space. By replaying such samples, we are able to refine the boundary between previous and current tasks, hence combating forgetting and reducing bias towards the current task. To mitigate the severity of a small replay buffer, we develop a novel MixUp-based strategy to increase replay variation by replaying mixed augmentations. Combined with RAR, this achieves a holistic framework that helps to alleviate catastrophic forgetting. We show that this excels on broadly-used benchmarks and outperforms other continual learning baselines especially when only a small buffer is available. We conduct a thorough ablation study over each key component as well as a hyperparameter sensitivity analysis to demonstrate the effectiveness and robustness of RAR.