skylark
ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning
Tang, Yihong, Ou, Jiao, Liu, Che, Zhang, Fuzheng, Zhang, Di, Gai, Kun
Role-playing is an emerging application in the field of Human-Computer Interaction (HCI), primarily implemented through the alignment training of a large language model (LLM) with assigned characters. Despite significant progress, role-playing agents (RPLAs) still struggle with maintaining role-consistency across conversations, particularly when confronted with boundary queries subtly related to character attributes. In this paper, we present ERABAL, a framework aimed at enhancing RPLAs' role-playing capabilities through boundary-aware learning. ERABAL encompasses a generation pipeline for role-specific dialogues and a concomitant methodology for alignment training. Through comprehensive evaluations, we demonstrate that ERABAL is both efficient and effective. By training with significantly fewer dialogues than those used in leading approaches, ERABAL achieves notable improvements across WikiRoleEval, CharacterEval, and the role-playing subset of MT-Bench compared to the generalist baseline models. Our code and datasets will be made publicly available to support further research.
Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement
Tang, Yihong, Ou, Jiao, Liu, Che, Zhang, Fuzheng, Zhang, Di, Gai, Kun
The advent of Large Language Models (LLMs) has propelled dialogue generation into new realms, particularly in the field of role-playing systems (RPSs). While enhanced with ordinary role-relevant training dialogues, existing LLM-based RPSs still struggle to align with roles when handling intricate and trapped queries in boundary scenarios. In this paper, we design the Modular ORchestrated Trap-setting Interaction SystEm (MORTISE) to benchmark and improve the role-playing LLMs' performance. MORTISE can produce highly role-relevant aggressive queries through the collaborative effort of multiple LLM-based modules, and formulate corresponding responses to create an adversarial training dataset via a consistent response generator. We select 190 Chinese and English roles to construct aggressive queries to benchmark existing role-playing LLMs. Through comprehensive evaluation, we find that existing models exhibit a general deficiency in role alignment capabilities. We further select 180 of the roles to collect an adversarial training dataset (named RoleAD) and retain the other 10 roles for testing. Experiments on models improved by RoleAD indicate that our adversarial dataset ameliorates this deficiency, with the improvements demonstrating a degree of generalizability in ordinary scenarios.
Can We Make a Musical Turing Test?
How much of what we consider to be fundamentally human can be reduced to an algorithm? Can we create something sufficiently advanced that people can no longer distinguish between the two? This, after all, is the idea behind the Turing Test, which has yet to be passed. At first glance, you might think music is beyond the realm of algorithms. Birds can sing, and people can compose symphonies.
Japanese businesses are struggling to keep up standards
KUMIKO HIRANO has noticed a disquieting change when she goes to her neighbourhood konbini, one of Japan's ubiquitous convenience stores. "No one is around and I have to use a loud voice to get someone to serve me," says the 48-year-old worker in Tokyo. This might not seem a big problem, but Japan prides itself on the standard of customer service, which approaches the level of bespoke attention elsewhere. Taxi drivers, who often wear white gloves, sometimes get out to bow when they drop off a passenger. Staff in shops and restaurants are unfailingly polite.
Skylark Is A Small Sea Drone
Every drone is an answer to that fundamental question of life: could a flying robot do this better? The Skylark C, by Israel-based defense company Elbit Systems, is built to launch from a boat, take a quick look around the sea, and then return to its crew. Think of it almost like a hitchhiking seagull, only instead of demanding snacks it captures pictures of potential enemies. As a maritime vessel organic asset, Skylark C provides the capabilities to inspect maritime activities from a safe distance, observe targets from a bird's eye view, perform reconnaissance over coastal areas and perform continuous covert surveillance, thus extending the vessel's ISR capabilities with respect to range, rate and quality of information obtained. In essence, Elbit is billing its new drone as an easy-to-use flying scout, which can take a closer look than the people on board the small ship that launched it.