Emergent Tool Use from Multi-Agent Interaction
By OpenAI Team
September 19, 2019
We’ve observed agents discovering progressively more complex tool use while playing a simple game of hide-and-seek. Through training in our new simulated hide-and-seek environment, agents build a series of six distinct strategies and counterstrategies, some of which we did not know our environment supported.
The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior.
In our environment, agents play a
team-based hide-and-seek game. Hiders (blue) are tasked with avoiding
line-of-sight from the seekers (red), and seekers are tasked with
keeping vision of the hiders. There are objects scattered throughout the
environment that hiders and seekers can grab and lock in place, as well
as randomly generated immovable rooms and walls that agents must learn
to navigate. Before the game begins, hiders are given a preparation
phase where seekers are immobilized to give hiders a chance to run away
or change their environment.
There are no explicit incentives for
agents to interact with objects in the environment; the only supervision
given is through the hide-and-seek objective. Agents are given a
team-based reward; hiders are given a reward of +1 if all hiders are
hidden and -1 if any hider is seen by a seeker. Seekers are given the
opposite reward, -1 if all hiders are hidden and +1 otherwise. To
confine agent behavior to a reasonable space, agents are penalized if
they go too far outside the play area. During the preparation phase, all
agents are given zero reward.
As agents train against each other in hide-and-seek, as many as six distinct strategies emerge. Each new strategy creates a previously nonexistent pressure for agents to progress to the next stage. Note that there are no direct incentives for agents to interact with objects or to explore; rather, the emergent strategies shown below are a result of the autocurriculum induced by multi-agent competition and the simple dynamics of hide-and-seek.