Researchers from DeepMind, an artificial intelligence company owned by Google's parent company Alphabet, aim to develop the artificial intelligence that will provide the best response to players by investigating their weaknesses. With the method called 'reinforced learning', the weaknesses of the players in the games are determined and appropriate responses are provided.
In the case of algorithms that are intended to be developed for the solution of real-life problems, computer games can provide scientists with a favorable basis for real-life development and development of algorithms. Thanks to this, the ground for artificial general intelligence (AGI) may have been prepared. AGI points to a system of decision making artificial intelligence that can not only think of ordinary and repetitive tasks such as data entry, but also about its environment.
According to a new article published by researchers at Google's parent company Alphabet, the artificial intelligence company DeepMind, a system has been created that learns the best responses to players' moves in some games. In the games, which includes Chess and Go, it is stated that this structure shows consistently high performance in 'worst opponents'. 'Worst opponents' is a term for players who are n't good but play and finish according to the rule .
Artificial intelligence that learns from weaknesses:
The level of performance against players is called 'weakness' in the project. Calculating this vulnerability requires a lot of intensive action since the sum of the actions the player can take is huge. For example, while Heads-Up Limit Texas Hold'em game, which is a version of the game called Texas Hold'em, has 14 out of 10 decision points, this number goes up to 170 over 10 . One way to avoid these processes is to use a method called reinforced learning. With this method, the best response can be calculated.
The structure proposed by DeepMind researchers has been named as About Best Response Information Status Monte Carlo Tree Research ( ABR IS-MCTS ). This structure converges to the best response on the basis of knowledge / situation. While actors in the structure follow an algorithm to play a game, the learner sets out from the results of various games to develop a movement style. ABR IS-MCTS tries to intuitively learn to create an accurate and exploitable counter strategy . The system that seeks for weaknesses gives unlimited access to the opponent's strategy and simulates what happens if someone has been trained to use his opponent's weaknesses over the years.
According to the researchers' data, in experiments with 200 players (trained on a computer with 4 processors and 8 GB of RAM) and a learner (trained on a computer with 10 processors and 20 GB of RAM), ABR IS-MCTS will win over 50% per game caught the rate. In addition, in games other than Hex or Go (such as Connect Four and Breakthrough), this rate exceeded 70% , and after training for 1 million episodes, it achieved 80% success in backgammon .
However, ABR IS-MCTS is noted to be quite slow in some instances . For example, in Kuhn Poker, a simplified version of two-player poker, it took an average of 150 seconds to calculate the vulnerability of a particular type of strategy . Future research is aimed at developing strategies for more complex games.