roleplayer
STaR-GATE: Teaching Language Models to Ask Clarifying Questions
Andukuri, Chinmaya, Fränken, Jan-Philipp, Gerstenberg, Tobias, Goodman, Noah D.
When prompting language models to complete a task, users often leave important aspects unsaid. While asking questions could resolve this ambiguity (GATE; Li et al., 2023), models often struggle to ask good questions. We explore a language model's ability to self-improve (STaR; Zelikman et al., 2022) by rewarding the model for generating useful questions-a simple method we dub STaR-GATE. We generate a synthetic dataset of 25,500 unique persona-task prompts to simulate conversations between a pretrained language model-the Questioner-and a Roleplayer whose preferences are unknown to the Questioner. By asking questions, the Questioner elicits preferences from the Roleplayer. The Questioner is iteratively finetuned on questions that increase the probability of high-quality responses to the task, which are generated by an Oracle with access to the Roleplayer's latent preferences. After two iterations of self-improvement, the Questioner asks better questions, allowing it to generate responses that are preferred over responses from the initial model on 72% of tasks. Our results indicate that teaching a language model to ask better questions leads to better personalized responses.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Croatia (0.04)
- North America > United States > New York (0.04)
- (3 more...)
- Research Report > New Finding (0.66)
- Personal > Interview (0.46)
Large Language Models are Diverse Role-Players for Summarization Evaluation
Wu, Ning, Gong, Ming, Shou, Linjun, Liang, Shining, Jiang, Daxin
Text summarization has a wide range of applications in many scenarios. The evaluation of the quality of the generated text is a complex problem. A big challenge to language evaluation is that there is a clear divergence between existing metrics and human evaluation. A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal. Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions. In this paper, we propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects. First, we propose to model objective and subjective dimensions of generated text based on roleplayers prompting mechanism. Furthermore, we introduce a context-based prompting mechanism that is able to generate dynamic roleplayer profiles based on input context. Finally, we design a multi-roleplayer prompting technology based on batch prompting and integrate multiple outputs into the final evaluation results. Experimental results on three real datasets for summarization show that our model is highly competitive and has a very high consistency with human annotators.
- North America > United States > Nevada (0.05)
- North America > United States > Virginia > Fairfax County > Springfield (0.04)
- Asia > China (0.04)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.95)
- Law (0.95)
- Media > News (0.93)