Next week, scientists working on artificial intelligence (AI) and games will be watching the latest human-machine matchup. But instead of a single pensive player squaring off against a computer, a team of five top video game players will be furiously casting magic spells and lobbing (virtual) fireballs at a team of bots called OpenAI Five. They’ll be playing the real-time strategy game Dota 2 at The International in Vancouver, Canada, an annual e-sports tournament that draws professional gamers who compete for millions of dollars.
In 1997, IBM’s Deep Blue AI bested chess champion Garry Kasparov. In 2016, DeepMind’s AlphaGo AI beat Lee Sedol, a world master, at the traditional Chinese board game Go. Computers have also defeated humans in checkers and some forms of poker. But fast-paced multiplayer video games pose a different kind of challenge, requiring computers to collaborate and manage unpredictability. The goal is common sense, which could help AIs handle real-world situations such as navigating traffic and providing home care—even if they never have to face a magic spell.
“The next big thing for AI is collaboration,” says Jun Wang, a computer scientist at University College London who works on StarCraft II, another real-time strategy game. That requires “strategic reasoning, where it’s understanding the incentives of others,” says Jakob Foerster, a computer scientist at the University of Oxford in the United Kingdom, who also works on StarCraft II.
Dota 2, released in 2013, has millions of players around the world. In a game, teams fight to destroy a structure on their enemy’s turf while defending their own, all the while collecting resources to increase their strength and skills. A well-matched game lasts about 45 minutes. A year ago, OpenAI, a research nonprofit based in San Francisco, California, revealed an AI that could beat the best human players in one-on-one games. But the five-on-five matchups showcased at The International present a much bigger challenge for a computer because the games are longer and more chaotic, says Greg Brockman, OpenAI’s co-founder and chief technology officer. Still, in a warmup exhibition last week, OpenAI Five easily beat a team of former pro players. “It sucks getting embarrassed by a nonperson,” says William “Blitz” Lee, who lost on stage in front of a live audience. “We were just getting crushed left and right.”
The range of possible moves in Dota 2 is far greater than in chess or Go, where each move has at most a few hundred options. In Dota 2, the action is constant, and players have thousands of options per move—where to flee, which spell to use, where to aim it. Such freedom, combined with the game’s inherent randomness and players’ ignorance about what’s out of view, means you can’t perfectly predict what the game will look like even one move ahead. In chess and Go, algorithms use search trees, analyzing branching possibilities far into the future. In Dota 2, forecasts become fuzzy much more quickly.
So instead of relying on search trees, OpenAI Five uses neural networks, algorithms inspired by the brain that strengthen connections between small computing elements in response to feedback. (AlphaGo combined neural nets with search trees.) During training, the system blindly experiments with different moves in the game. When they perform well, the connections responsible for those acts are reinforced. After thousands of years of (sped-up) gameplay, strong strategies emerge. OpenAI applied this method, known as reinforcement learning, on a massive scale, running the algorithm on thousands of computers at once. “OpenAI Five is one of the most impressive demonstrations of reinforcement learning I have seen,” says Niels Justesen, a computer scientist at the IT University of Copenhagen who also works on StarCraft II.
Parsing last week’s warmup performance, Michael Cook, a computer scientist at Falmouth University in the United Kingdom who studies AI and games, says OpenAI Five excels by relying on a “superhuman ability to calculate the outcome of certain actions,” such as the damage a particular attack will inflict on an opponent. OpenAI handicapped it to have the same reaction time as a human player, about a fifth of a second, but in that blink, the system processes much more information. Such thoroughness and precision make it deadly at fights, leading to “a Blitzkrieg-like approach to the game,” Cook says. “It’s amazing to watch.” But the aggressiveness may mask a weakness in long-term strategizing, Cook suggests: In one game that it lost, the AI was assigned characters that needed more time to build up their abilities, and it couldn’t adjust.
On the face of it, OpenAI Five also appears to succeed at collaboration. The AI’s five players were quite willing to be killed for the overall good of the team, which might confer an advantage over the human teams. “The bot plays very sacrificially,” Lee says. Humans are less likely to give up a player in order to win, he says. “It’s a very human concept to be greedier.” But the AI relies on a kind of hive mind that may make coordination easier. Each of five nearly identical algorithms in the system gets a peek at what the others see, whereas humans see only what’s on their own screens and share information only by talking. To collaborate with people or programs unlike themselves, whether in games or in life, Wang says the algorithms will eventually need to develop communication skills and “theory of mind”—models of the beliefs and desires of other people and algorithms.
StarCraft II, the game many AI researchers prefer to work on, may be better for honing long-term planning. It is more like economics, Wang says, as it entails managing a colony’s resources as it builds weapons factories. And other games may provide better tests of an AI’s ability to model the mind of an opponent than Dota 2; Foerster mentions strategy board games such as Settlers of Catan and Risk, where multiple players must negotiate, trade, and form alliances, both cooperating and competing.
Even so, Dota 2 remains a worthy test for an AI. Many experts are expecting OpenAI Five to win at The International. But Vanessa Volz, a computer scientist at the Technical University of Dortmund in Germany who studies AI and games, sees a potential weakness, in that OpenAI Five uses self-play to train its algorithms. “This approach has the risk of being vulnerable to previously unseen playing styles,” she says. Lee, who lost to the AI, feels the same way. “Right now, the bot is a little too rigid,” he says. “It’s starting to get a little too predictable. I feel like if we’d had a few more games we would have been able to take games off it pretty cleanly.”
source:-sciencemag.o