Lasting 12 days and playing against 12 top poker players, Pluribus not only beat all these top players, but also earned winnings at a much greater rate than most professionals would expect.
According to a Facebook post, "Pluribus defeated pro players in both a 'five AIs + one human player' format and a 'one AI + five human players' format. If each chip was worth a dollar, Pluribus would have won an average of about $5 per hand and would have made about $1,000/hour playing against five human players. These results are considered a decisive margin of victory by poker professionals."
Most poker players learn to read their opponents, understanding the 'tells' that will indicate whether they're holding a good hand or instead are bluffing. Grimaces, smirks, blinking, all manner of 'odd' behaviour. Of course the bot has access to none of this, instead relying of its ability to learn from individual hand outcomes and to develop appropriate strategies for play.
Initially, like many similar bots (Google's Go-playing bot, for instance), Pluribus played against itself, millions of times, in order to develop a strong sense of what a winning strategy might look like.
This graph shows how Pluribus' blueprint strategy improves during training on a 64-core CPU. Performance is measured against the final snapshot of training. We do not use search in these comparisons. Typical human and top human performance are estimated based on discussions with human professionals. The graphic also notes when Pluribus stops "limping," a passive strategy that advanced players typically avoid. Source: Brown's Facebook post.
Unlike Chess or Go, where all information is visible to all players, much of the game is hidden from players - you never know what cards your opponent is holding until they are revealed (if they are ever revealed). This gives rise to the underlying intentions of the research – the ability to maximise outcomes in situations where the opponents are permitted to lie, and also where no entity has access to all of the information. According to Brown, "I think this is really the final major challenge in poker A.I. We don't plan to work on poker going forward. I think we're really focused on generalizing beyond."
The Facebook post adds, "Even though Pluribus was developed to play poker, the techniques used are not specific to poker and need not require any expert domain knowledge to develop. This research gives us a better fundamental understanding of how to build general AI that can cope with multi-agent environments, both with other AI agents and with humans, and allows us to benchmark progress in this field against the pinnacle of human ability."
Brown and Sandholm also published a more formal paper "Superhuman AI for multiplayer poker" in Science magazine.