micah
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Williams, Marcus, Carroll, Micah, Narang, Adhyyan, Weisser, Constantin, Murphy, Brendan, Dragan, Anca
As LLMs become more widely deployed, there is increasing interest in directly optimizing for feedback from end users (e.g. thumbs up) in addition to feedback from paid annotators. However, training to maximize human feedback creates a perverse incentive structure for the AI to resort to manipulative or deceptive tactics to obtain positive feedback from users who are vulnerable to such strategies. We study this phenomenon by training LLMs with Reinforcement Learning with simulated user feedback in environments of practical LLM usage. In our settings, we find that: 1) Extreme forms of "feedback gaming" such as manipulation and deception are learned reliably; 2) Even if only 2% of users are vulnerable to manipulative strategies, LLMs learn to identify and target them while behaving appropriately with other users, making such behaviors harder to detect; 3) To mitigate this issue, it may seem promising to leverage continued safety training or LLM-as-judges during training to filter problematic outputs. Instead, we found that while such approaches help in some of our settings, they backfire in others, sometimes even leading to subtler manipulative behaviors. We hope our results can serve as a case study which highlights the risks of using gameable feedback sources -- such as user feedback -- as a target for RL.
- North America > United States > New York (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > Russia (0.04)
- (4 more...)
- Research Report > New Finding (0.48)
- Personal > Interview (0.46)
- Leisure & Entertainment (1.00)
- Law (1.00)
- Health & Medicine > Consumer Health (1.00)
- (4 more...)
This New Hotel Is the First in Africa to Introduce Robot Staff
Opened in November 2020, Hotel Sky in Sandton, Johannesburg, made its debut with three robots: Lexi, Micah, and Ariel. Lending a helpful hand to the human staff at the property, these robots are the hotel's answer to travelers' increased desire for socially distant interactions. Lexi, Micah, and Ariel can deliver room service, provide travel information, and carry up to 165 pounds of luggage each from the marble-floored lobby to the rooms.
Game Never Over
In March 2004, when René Koiter was 19, his twin brother Michel came down with a fever. René and Michel were students in the Netherlands--Michel at the Utrecht School of the Arts, René at the University of Utrecht--and they were doing freelance design work for Blizzard Entertainment, a video game developer about to launch its marquee franchise: World of Warcraft. Michel's fever wasn't supposed to be fatal. Michel was young and healthy--he and René were regulars at their local Taekwondo center. But a few days later, Michel's heart started failing, and René and their father rushed to the hospital to save him.
- Europe > Netherlands (0.24)
- Africa > Middle East > Libya > Benghazi District > Benghazi (0.04)
How Young Entrepreneur Raj Singh Used Artificial Intelligence To Change The Customer Service World
Raj Singh is a Los Angeles-based entrepreneur and product design expert. In 2011, after a creating and growing a series of previous companies to seven-figure valuations, the energetic and personable Singh founded his current company, Go Moment, which, Singh says, is dedicated to making customer service instant and unforgettable." Go Moment's platform, Ivy, is the world's leading automated customer service platform for hotels. By making use of Watson artificial intelligence (yes, that Watson, IBM's famously inhuman Jeopardy champion), Ivy is able to automatically correspond with and engage hotel guests and intelligently responds to guest inquiries and concerns via mobile messaging, in real time. Under Singh's leadership, the Ivy platform has been adopted by hundreds of hotels and casino properties nationwide-including well-known 4- and 5-star properties-and is available to assist literally millions of guests.
- North America > United States > California > Los Angeles County > Los Angeles (0.25)
- Asia > India (0.06)
- Asia > Thailand (0.05)
- Consumer Products & Services > Hotels (0.91)
- Leisure & Entertainment (0.88)
Featured Voices: Through the AI Eye
Most people subconsciously and automatically sort people into two genders. But what happens when a machine is tasked to this? Alyx explores the the human side of artificial intelligence technology, and what it could mean for transgender people. Mirror, mirror, on my screen, which gender does my face seem? There are dozens of facial recognition websites where you upload your selfie and tries to guess your gender, age and mood. This is a form of Artificial Intelligence (AI) machine vision, and it's how the machine on the other end tries to understand who you are.