Goto

Collaborating Authors

 starling


STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

Basavatia, Shreyas, Murugesan, Keerthiram, Ratnakar, Shivam

arXiv.org Artificial Intelligence

Interactive fiction games have emerged as an important application to improve the generalization capabilities of language-based reinforcement learning (RL) agents. Existing environments for interactive fiction games are domain-specific or time-consuming to generate and do not train the RL agents to master a specific set of skills. In this work, we introduce an interactive environment for self-supervised RL, STARLING, for text-based games that bootstraps the text-based RL agents with automatically generated games (based on the seed set of game ideas) to boost the performance and generalization capabilities to reach a goal of the target environment. These games let the agent hone their skills on a predefined set of tasks. We create and test an environment with 100 games, generated using this automated framework that uses large language models (GPT-3) and an interactive fiction game engine (based on Inform7) to provide the user with the ability to generate more games under minimal human supervision. Experimental results based on both the human participants and baseline text-based RL agents reveal that current state-of-the-art text-based RL agents cannot use previously learned skills in new situations at the level humans can. These results enforce STARLING's potential to serve as a sandbox environment for further research in self-supervised text-based RL.


Unintended Impacts of LLM Alignment on Global Representation

Ryan, Michael J., Held, William, Yang, Diyi

arXiv.org Artificial Intelligence

Before being deployed for user-facing applications, developers align Large Language Models (LLMs) to user preferences through a variety of procedures, such as Reinforcement Learning From Human Feedback (RLHF) and Direct Preference Optimization (DPO). Current evaluations of these procedures focus on benchmarks of instruction following, reasoning, and truthfulness. However, human preferences are not universal, and aligning to specific preference sets may have unintended effects. We explore how alignment impacts performance along three axes of global representation: English dialects, multilingualism, and opinions from and about countries worldwide. Our results show that current alignment procedures create disparities between English dialects and global opinions. We find alignment improves capabilities in several languages. We conclude by discussing design decisions that led to these unintended impacts and recommendations for more equitable preference tuning. We make our code and data publicly available on Github.


Audiobook Narrators Fear Spotify Used Their Voices to Train AI

WIRED

Gary Furlong, a Texas-based audiobook narrator, had worried for a while that synthetic voices created by algorithms could steal work from artists like himself. Early this month, he felt his worst fears had been realized. Furlong was among the narrators and authors who became outraged after learning of a clause in contracts between authors and leading audiobook distributor Findaway Voices, which gave Apple the right to "use audiobooks files for machine learning training and models." Findaway was acquired by Spotify last June. Some authors and narrators say they were not clearly informed about the clause and feared it may have allowed their work or voices to contribute to Apple's development of synthetic voices for audiobooks.


The Unintended Beauty of Starlings - Issue 83: Intelligence

Nautilus

Eugene Schieffelin was the eccentric ornithologist who in 1890 shipped 60 starlings from London to New York City and set them free in Central Park. The next year he released 40 more, and today there are maybe 200 million starlings in the United States and Southern Canada. As immigrants go, starlings are shrewd flyers, clever mimics, and often unwelcome. The truth is they're no more than uptown blackbirds, stocky, three-ounce grifters with iridescent blue and green plumage, along with yellow beaks and a long history of displacing woodpeckers and flycatchers, and destroying entire crops of berries and cherries. Not to mention the havoc they cause at many airports.


Watch a Mesmerizing Swarm of Starlings

National Geographic

Called a murmuration, this defensive behavior is inspiring computer programming and other applications. Flocks of acrobatic starlings have long delighted observers, from Shakespeare to the present day. The birds--sometimes by the thousands--often seem to move as one, coursing through the air at breakneck speeds, turning on an instant. We recently published video of a beautiful starling swarm in the Netherlands. So many wings can be heard flapping in that video that it's easy to get why starling swarms are called murmurations.