In our environment, agents play a team-based hide-and-seek game. Hiders (blue) are tasked with avoiding line-of-sight from the seekers (red), and seekers are tasked with keeping vision of the hiders. There are objects scattered throughout the environment that hiders and seekers can grab and lock in place, as well as randomly generated immovable rooms and walls that agents must learn to navigate. Before the game begins, hiders are given a preparation phase where seekers are immobilized to give hiders a chance to run away or change their environment. There are no explicit incentives for agents to interact with objects in the environment; the only supervision given is through the hide-and-seek objective.
Humans are a species that can adapt to environmental challenges, and over eons this has enabled us to biologically evolve -- an essential characteristic found in animals but absent in AI. Although machine learning has made remarkable progress in complex games such as Go and Dota 2, the skills mastered in these arenas do not necessarily generalize to practical applications in real-world scenarios. The goal for a growing number of researchers is to build a machine intelligence that behaves, learns and evolves more like humans. A new paper from San Francisco-based OpenAI proposes that training models in the children's game of hide-and-seek and pitting them against each other in tens of millions of contests results in the models automatically developing humanlike behaviors that increase their intelligence and improve subsequent performance. Hide-and-seek was selected as a fun starting point mostly due to its simple rules, says the paper's first author, OpenAI Researcher Bowen Baker.
Competition is one of the socio-economic dynamics that has influenced our evolutions as species. The vast amount of complexity and diversity on Earth evolved due to co-evolution and competition between organisms, directed by natural selection. By competing against a different party, we are constantly forced to improve our knowledge and skills on a specific subject. Recent developments in artificial intelligence(AI) have started to leverage some of the principles of competition to influence learning behaviors in AI agents. Specifically, the field of multi-agent reinforcement learning(MARL) has been heavily influenced by the competitive and game-theoretic dynamics.
Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination. We find clear evidence of six emergent phases in agent strategy in our environment, each of which creates a new pressure for the opposing team to adapt; for instance, agents learn to build multi-object shelters using moveable boxes which in turn leads to agents discovering that they can overcome obstacles using ramps. We further provide evidence that multi-agent competition may scale better with increasing environment complexity and leads to behavior that centers around far more human-relevant skills than other self-supervised reinforcement learning methods such as intrinsic motivation. Finally, we propose transfer and fine-tuning as a way to quantitatively evaluate targeted capabilities, and we compare hide-and-seek agents to both intrinsic motivation and random initialization baselines in a suite of domain-specific intelligence tests.
Artificial Intelligence Discovers Tool Use in Hide-and-Seek Games Programmers at OpenAI, an artificial intelligence research company, recently taught a gaggle of intelligent artificial agents -- bots -- to play hide-and-seek. Not because they cared who won: The goal was to observe how competition between hiders and seekers would drive the bots to find and use digital tools. The idea is familiar to anyone who's ever played the game in real life; it's a kind of scaled-down arms race. When your opponent adopts a strategy that works, you have to abandon what you were doing before and find a new, better plan. It's the rule that governs games from chess to StarCraft II; it's also an adaptation that seems likely to confer an evolutionary advantage. So it went with hide-and-seek.