In our environment, agents play a team-based hide-and-seek game. Hiders (blue) are tasked with avoiding line-of-sight from the seekers (red), and seekers are tasked with keeping vision of the hiders. There are objects scattered throughout the environment that hiders and seekers can grab and lock in place, as well as randomly generated immovable rooms and walls that agents must learn to navigate. Before the game begins, hiders are given a preparation phase where seekers are immobilized to give hiders a chance to run away or change their environment. There are no explicit incentives for agents to interact with objects in the environment; the only supervision given is through the hide-and-seek objective.
Humans are a species that can adapt to environmental challenges, and over eons this has enabled us to biologically evolve -- an essential characteristic found in animals but absent in AI. Although machine learning has made remarkable progress in complex games such as Go and Dota 2, the skills mastered in these arenas do not necessarily generalize to practical applications in real-world scenarios. The goal for a growing number of researchers is to build a machine intelligence that behaves, learns and evolves more like humans. A new paper from San Francisco-based OpenAI proposes that training models in the children's game of hide-and-seek and pitting them against each other in tens of millions of contests results in the models automatically developing humanlike behaviors that increase their intelligence and improve subsequent performance. Hide-and-seek was selected as a fun starting point mostly due to its simple rules, says the paper's first author, OpenAI Researcher Bowen Baker.
Competition is one of the socio-economic dynamics that has influenced our evolutions as species. The vast amount of complexity and diversity on Earth evolved due to co-evolution and competition between organisms, directed by natural selection. By competing against a different party, we are constantly forced to improve our knowledge and skills on a specific subject. Recent developments in artificial intelligence(AI) have started to leverage some of the principles of competition to influence learning behaviors in AI agents. Specifically, the field of multi-agent reinforcement learning(MARL) has been heavily influenced by the competitive and game-theoretic dynamics.
Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination. We find clear evidence of six emergent phases in agent strategy in our environment, each of which creates a new pressure for the opposing team to adapt; for instance, agents learn to build multi-object shelters using moveable boxes which in turn leads to agents discovering that they can overcome obstacles using ramps. We further provide evidence that multi-agent competition may scale better with increasing environment complexity and leads to behavior that centers around far more human-relevant skills than other self-supervised reinforcement learning methods such as intrinsic motivation. Finally, we propose transfer and fine-tuning as a way to quantitatively evaluate targeted capabilities, and we compare hide-and-seek agents to both intrinsic motivation and random initialization baselines in a suite of domain-specific intelligence tests.
For decades, artificial intelligence scientists have sought to create intelligent machines by trying to study and replicate the structure and functionality of the human brain. Last week, researchers at AI research lab OpenAI introduced a more fundamental approach at developing AI, a project inspired by natural selection and competition, the simple rules that have led to the evolution of all living beings, including humans. The AI researchers pitted multiple AI agents against each other to compete for conflicting goals. They observed that the AI developed new and sophisticated behavior in the long term. While the project draws on existing AI techniques and concepts, it might provide new approaches and ideas to creating AI applications.