Colin G. Johnson, an associate professor at the University of Nottingham, recently developed a deep-learning technique that can learn a so-called "fitness function" from a set of sample solutions to a problem. This technique, presented in a paper published in Wiley's Expert Systems journal, was initially trained to solve the Rubik's cube, the popular 3-D combination puzzle invented by Hungarian sculptor Ernő Rubik. "The aim of our paper was to use machine learning to learn to solve the Rubik's cube," Colin G. Johnson, one of the researchers who carried out the study, told TechXplore. "Rubik's cube is a very complex puzzle, but any of the vast number of combinations is at most 20 steps from a solution. So the approach we take here is to try and solve the problem by learning to do each of those steps individually."

The Rubik's Cube is a three-dimensional puzzle developed in 1974 by the Hungarian inventor Erno Rubik, the object being to align all squares of the same color on the same face of the cube. It became an international best-selling toy and sold over 350 million units. The puzzle has also attracted considerable interest from computer scientists and mathematicians. One question that has intrigued them is the smallest number of moves needed to solve it from any position. The answer, proved in 2014, turns out to be 26.

Incredibly, the system learned to dominate the classic 3D puzzle in just 44 hours and without any human intervention. "A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision," write the authors of the new paper, published online at the arXiv preprint server. Indeed, if we're ever going to achieve a general, human-like machine intelligence, we'll have to develop systems that can learn and then apply those learnings to real-world applications. Recent breakthroughs in machine learning have produced systems that, without any prior knowledge, have learned to master games like chess and Go. But these approaches haven't translated very well to the Rubik's Cube.

We've trained a pair of neural networks to solve the Rubik's Cube with a human-like robot hand. The neural networks are trained entirely in simulation, using the same reinforcement learning code as OpenAI Five paired with a new technique called Automatic Domain Randomization (ADR). The system can handle situations it never saw during training, such as being prodded by a stuffed giraffe. This shows that reinforcement learning isn't just a tool for virtual tasks, but can solve physical-world problems requiring unprecedented dexterity. Human hands let us solve a wide variety of tasks.