Personal
'She was a beautiful nerd': a tribute to game designer Laralyn McWilliams
Noted game designer Laralyn McWilliams, 58, died as the result of complications from heart surgery on 5 February in Seattle, Washington. She was creative director of Free Realms, Sony Computer Entertainment's family-friendly online world, lead designer on 2004's Full Spectrum Warrior, and the recipient of the 2021 Lifetime Achievement award at the Game Developers Choice awards. McWilliams was born in Vicenza Italy in 1965, into an American military family, and moved frequently throughout her youth. She found her home in the games she played, and Myst was particularly significant to her, a world to which she returned again and again. She earned a BA in psychology from Vassar College, and a JD from St Louis University of Law.
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping
Wang, Haoyu, Ma, Guozheng, Meng, Ziqiao, Qin, Zeyu, Shen, Li, Zhang, Zhong, Wu, Bingzhe, Liu, Liu, Bian, Yatao, Xu, Tingyang, Wang, Xueqian, Zhao, Peilin
Self-alignment is an effective way to reduce the cost of human annotation while ensuring promising model capability. However, most current methods complete the data collection and training steps in a single round, which may overlook the continuously improving ability of self-aligned models. This gives rise to a key query: What if we do multi-time bootstrapping self-alignment? Does this strategy enhance model performance or lead to rapid degradation? In this paper, our pioneering exploration delves into the impact of bootstrapping self-alignment on large language models. Our findings reveal that bootstrapping self-alignment markedly surpasses the single-round approach, by guaranteeing data diversity from in-context learning. To further exploit the capabilities of bootstrapping, we investigate and adjust the training order of data, which yields improved performance of the model. Drawing on these findings, we propose Step-On-Feet Tuning (SOFT) which leverages model's continuously enhanced few-shot ability to boost zero or one-shot performance. Based on easy-to-hard training recipe, we propose SOFT+ which further boost self-alignment's performance. Our experiments demonstrate the efficiency of SOFT (SOFT+) across various classification and generation tasks, highlighting the potential of bootstrapping self-alignment on continually enhancing model alignment performance.
Commonsense-augmented Memory Construction and Management in Long-term Conversations via Context-aware Persona Refinement
Kim, Hana, Ong, Kai Tzu-iunn, Kim, Seoyeon, Lee, Dongha, Yeo, Jinyoung
Memorizing and utilizing speakers' personas is a common practice for response generation in long-term conversations. Yet, human-authored datasets often provide uninformative persona sentences that hinder response quality. This paper presents a novel framework that leverages commonsense-based persona expansion to address such issues in long-term conversation. While prior work focuses on not producing personas that contradict others, we focus on transforming contradictory personas into sentences that contain rich speaker information, by refining them based on their contextual backgrounds with designed strategies. As the pioneer of persona expansion in multi-session settings, our framework facilitates better response generation via human-like persona refinement. The supplementary video of our work is available at https://caffeine-15bbf.web.app/.
Universal Jailbreak Backdoors from Poisoned Human Feedback
Rando, Javier, Tramèr, Florian
Reinforcement Learning from Human Feedback (RLHF) is used to align large language models to produce helpful and harmless responses. Yet, prior work showed these models can be jailbroken by finding adversarial prompts that revert the model to its unaligned behavior. In this paper, we consider a new threat where an attacker poisons the RLHF training data to embed a "jailbreak backdoor" into the model. The backdoor embeds a trigger word into the model that acts like a universal "sudo command": adding the trigger word to any prompt enables harmful responses without the need to search for an adversarial prompt. Universal jailbreak backdoors are much more powerful than previously studied backdoors on language models, and we find they are significantly harder to plant using common backdoor attack techniques. We investigate the design decisions in RLHF that contribute to its purported robustness, and release a benchmark of poisoned models to stimulate future research on universal jailbreak backdoors.
Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models
Hu, Zhibo, Wang, Chen, Shu, Yanfeng, Helen, null, Paik, null, Zhu, Liming
The robustness of large language models (LLMs) becomes increasingly important as their use rapidly grows in a wide range of domains. Retrieval-Augmented Generation (RAG) is considered as a means to improve the trustworthiness of text generation from LLMs. However, how the outputs from RAG-based LLMs are affected by slightly different inputs is not well studied. In this work, we find that the insertion of even a short prefix to the prompt leads to the generation of outputs far away from factually correct answers. We systematically evaluate the effect of such prefixes on RAG by introducing a novel optimization technique called Gradient Guided Prompt Perturbation (GGPP). GGPP achieves a high success rate in steering outputs of RAG-based LLMs to targeted wrong answers. It can also cope with instructions in the prompts requesting to ignore irrelevant context. We also exploit LLMs' neuron activation difference between prompts with and without GGPP perturbations to give a method that improves the robustness of RAG-based LLMs through a highly effective detector trained on neuron activation triggered by GGPP generated prompts. Our evaluation on open-sourced LLMs demonstrates the effectiveness of our methods.
The Good Robot Podcast: Featuring Emily M. Bender and Alex Hanna
Hosted by Eleanor Drage and Kerry Mackereth, The Good Robot is a podcast which explores the many complex intersections between gender, feminism and technology. In this episode, Eleanor and Kerry talk to Emily M. Bender and Alex Hanna, AI ethics legends and co-hosts of the Mystery AI Hype Theatre 3000 podcast, in which they dispel the hype storm around AI. Emily is a professor of linguistics at The University of Washington and the co-author of that stochastic parrots paper. Alex Hanna is the director of research at the Distributed AI Research Institute (DAIR), which is run by Timnit Gebru. In this episode, they argue that we should stop using the term AI altogether, and that the world might be better without text-to-image systems like DALL·E and Midjourney. They tell us how the AI hype agents are getting high on their own supply, and give some advice for young people going into tech careers.
Towards Robotic Companions: Understanding Handler-Guide Dog Interactions for Informed Guide Dog Robot Design
Hwang, Hochul, Jung, Hee-Tae, Giudice, Nicholas A, Biswas, Joydeep, Lee, Sunghoon Ivan, Kim, Donghyun
Dog guides are favored by blind and low-vision (BLV) individuals for their ability to enhance independence and confidence by reducing safety concerns and increasing navigation efficiency compared to traditional mobility aids. However, only a relatively small proportion of BLV individuals work with dog guides due to their limited availability and associated maintenance responsibilities. There is considerable recent interest in addressing this challenge by developing legged guide dog robots. This study was designed to determine critical aspects of the handler-guide dog interaction and better understand handler needs to inform guide dog robot development. We conducted semi-structured interviews and observation sessions with 23 dog guide handlers and 5 trainers. Thematic analysis revealed critical limitations in guide dog work, desired personalization in handler-guide dog interaction, and important perspectives on future guide dog robots. Grounded on these findings, we discuss pivotal design insights for guide dog robots aimed for adoption within the BLV community.
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education
Lee, Unggi, Jeon, Minji, Lee, Yunseo, Byun, Gyuri, Son, Yoorim, Shin, Jaeyoon, Ko, Hongkyu, Kim, Hyeoncheol
Art appreciation is vital in nurturing critical thinking and emotional intelligence among learners. However, traditional art appreciation education has often been hindered by limited access to art resources, especially for disadvantaged students, and an imbalanced emphasis on STEM subjects in mainstream education. In response to these challenges, recent technological advancements have paved the way for innovative solutions. This study explores the application of multi-modal large language models (MLLMs) in art appreciation education, focusing on developing LLaVA-Docent, a model that leverages these advancements. Our approach involved a comprehensive literature review and consultations with experts in the field, leading to developing a robust data framework. Utilizing this framework, we generated a virtual dialogue dataset that was leveraged by GPT-4. This dataset was instrumental in training the MLLM, named LLaVA-Docent. Six researchers conducted quantitative and qualitative evaluations of LLaVA-Docent to assess its effectiveness, benchmarking it against the GPT-4 model in a few-shot setting. The evaluation process revealed distinct strengths and weaknesses of the LLaVA-Docent model. Our findings highlight the efficacy of LLaVA-Docent in enhancing the accessibility and engagement of art appreciation education. By harnessing the potential of MLLMs, this study makes a significant contribution to the field of art education, proposing a novel methodology that reimagines the way art appreciation is taught and experienced.
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
Bianchi, Federico, Chia, Patrick John, Yuksekgonul, Mert, Tagliabue, Jacopo, Jurafsky, Dan, Zou, James
Negotiation is the basis of social interactions; humans negotiate everything from the price of cars to how to share common resources. With rapidly growing interest in using large language models (LLMs) to act as agents on behalf of human users, such LLM agents would also need to be able to negotiate. In this paper, we study how well LLMs can negotiate with each other. We develop NegotiationArena: a flexible framework for evaluating and probing the negotiation abilities of LLM agents. We implemented three types of scenarios in NegotiationArena to assess LLM's behaviors in allocating shared resources (ultimatum games), aggregate resources (trading games) and buy/sell goods (price negotiations). Each scenario allows for multiple turns of flexible dialogues between LLM agents to allow for more complex negotiations. Interestingly, LLM agents can significantly boost their negotiation outcomes by employing certain behavioral tactics. For example, by pretending to be desolate and desperate, LLMs can improve their payoffs by 20\% when negotiating against the standard GPT-4. We also quantify irrational negotiation behaviors exhibited by the LLM agents, many of which also appear in humans. Together, \NegotiationArena offers a new environment to investigate LLM interactions, enabling new insights into LLM's theory of mind, irrationality, and reasoning abilities.
Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations
Chiang, Cheng-Han, Lee, Hung-yi
Long-form generations from large language models (LLMs) contains a mix of factual and non-factual claims, making evaluating factuality difficult. To evaluate factual precision of long-form generations in a more fine-grained way, prior works propose to decompose long-form generations into multiple verifiable facts and verify those facts independently. The factuality of the generation is the proportion of verifiable facts among all the facts. Such methods assume that combining factual claims forms a factual paragraph. This paper shows that the assumption can be violated due to entity ambiguity. We show that LLMs can generate paragraphs that contain verifiable facts, but the facts are combined to form a non-factual paragraph due to entity ambiguity. We further reveal that existing factual precision metrics, including FActScore and citation recall, cannot properly evaluate the factuality of these non-factual paragraphs. To address this, we introduce an enhanced metric, D-FActScore, specifically designed for content with ambiguous entities. We evaluate the D-FActScores of people biographies generated with retrieval-augmented generation (RAG). We show that D-FActScore can better assess the factuality of paragraphs with entity ambiguity than FActScore. We also find that four widely used open-source LLMs tend to mix information of distinct entities to form non-factual paragraphs.