With the pacy developments in the libratus field of artificial intelligence, it has been observed that machines have started to outperform humans in practically every field. But how does the idea of an AI champion of poker sound? Ridiculous right? Well, after reading the article, you would be amazed to know that we now have an AI that championed the game of Poker.
Think that algorithms have no place in this kind of competition
Where bluff and experience prevail over the “pure and simple” calculation of probability, the AI has known how to adapt and win. The AI that took mastery over the human mind in the game of Poker is known as Libratus. After beating humans in puzzle games (chess, GO), a new artificial intelligence, Libratus, beat 4 of the best poker players in the world!
An AI dedicated to Poker
Unlike traditional games, the biggest difficulties in Poker are:
- The concealment of information (we do not see the opponent's cards)
- The falsification of information (we bluff by pretending we have some cards in hand)
To work in this type of environment, you, therefore, needed a dedicated artificial intelligence whose scalable algorithm would make it possible to exploit the little information it has.
Libratus, from the Latin “balanced”, thus took 15 million core-hours of computation of a supercomputer in the United States to train. The strategy, as with any AI-based on learning (especially reinforcement), is not determined in advance. Here, the method of minimizing hypothetical regret (aka CFR, for Counterfactual Regret Minimization) was chosen. CFR can be explained in a few key points: What gets registered as regret is what we would have gained from playing such an action fixed at such a time for all the previous games, compared to what we did there.
Which essentially means that the AI learns by playing against itself “Libratus”
For example, in the game of Rock-paper-scissors, if we play rock and our opponent plays paper, we lose. We, therefore, regret not having played scissors. The regret will be “if in all my previous games, by playing scissors directly, would I have won more than following my current strategy?” If we only did one part, the answer is a direct yes. But if our opponent often played rock too, then the answer could be no, and the strategy might be to stop playing. That is where AI learns to extract maximum profits by avoiding losses as well as the draws.
At the start, the strategy is random, but after each game, all the decisions taken are reviewed thanks to regrets: If the regret is positive, it will be necessary to change the action more often than calculated If it is negative, then the machine believes it has done the best thing possible, and it must continue like this. The strategy is thus reviewed so that it takes into account positive regrets (the probability of taking action, therefore, depends on its interest in making us win overall) To simplify it, the machine doesn’t look to win a game but the maximum number of games.