Google's DeepMind shows off self-taught AlphaGo Zero AI

Google's DeepMind shows off self-taught AlphaGo Zero AI

Crucially, the new program, called AlphaGo Zero, learned its skills with zero human training - working out its own style entirely from scratch.

In a paper published to Nature documenting the results, the DeepMind team explained that AlphaGo Zero is not only a better player than the original, it's also much more efficient.

Last year, DeepMind came out with its first version of AlphaGo.

DeepMind has trained previous versions of the program by giving it a database full of thousands of human-played games of Go.

There is no rest for the tired artificial intelligence: In the five months since being crowned world Go champion, AlphaGo has become stronger, smarter, and more powerful than ever. This is like asking an expert to make a prediction, rather than relying on the games of 100 weak players, said Silver.

During the games, AlphaGo played a handful of highly inventive winning moves, several of which - including move 37 in game two - were so surprising they overturned hundreds of years of received wisdom, and have since been examined extensively by players of all levels.

The number of possible configurations on the Go board is greater than the number of atoms in the universe. Due to the complexity of the game, only humans where thought to be able to master it, which eventually has proved to be an incorrect assumption. "Instead, it is able to learn tabula rasa [the idea that knowledge comes from experience or perception] from the strongest player in the world: AlphaGo itself". This process is called supervised learning, as the machine is instructed on the basis of human examples.

The program hasn't mastered Go without human knowledge, he says, because "actually prior knowledge has gone into the construction of the algorithm itself".

Facebook rolls out 'Explore Feed' to help users find new content
The feature collects posts and information that are very similar to the ones that users already liked in their News Feed. The Explore Feed has been in testing for quite some time, having gained an official rocket ship icon earlier this year .

Go is exemplary in many ways of the difficulties faced by artificial intelligence: a challenging decision-making task, an intractable search space, and an optimal solution so complex it appears infeasible to directly approximate using a policy or value function. But how was this possible?

But there are some situations where an AI can train itself: rules-based systems in which the computer can evaluate its own actions and determine if they were good ones. After 40 days, AlphaGo Zero was arguably the best thing to ever play Go. Zero stands for total domination, it seems.

DeepMind, Google's AI research organization, announced today in a blog that AlphaGo Zero, the latest evolution of AlphaGo (the first computer program to defeat a Go world champion) trained itself within three days to play Go at a superhuman level (i.e., better than any human) - and to beat the old version of AlphaGo - without leveraging human expertise, data or training.

Even more impressive, the program also discovered novel combinations of moves that human Go masters had never even conceived.

"This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge". In just 36 hours, AlphaGo Zero was ready to knock its predecessor off the top of the mountain.

However, AlphaGo Zero is not unique only in terms of its ability to play, but it has other considerable advantages.

While the original required 48 specially built AI processors to run, AlphaGo Zero needs just four. A second neural network was trained to predict the victor of the game following a given move. Another outcome of the use of a single network is that the required hardware is ten times less expensive than the one used by the previous versions (it still costs UD$25 million, by the way).

Related Articles