Both the DeepMind and CMU approaches use deep reinforcement learning, popularized by DeepMind's Atari-playing AI. A neural network is fed raw pixel data from a virtual environment and uses rewards, like points in a computer game, to learn by trial and error (see "10 Breakthrough Technologies 2017: Reinforcement Learning"). By running through millions of training scenarios at accelerated speeds, both AI programs learned to associate words with particular objects and characteristics, which let them follow the commands. The millions of training runs required means Domingos is not convinced pure deep reinforcement learning will ever crack the real world.
Abstract: We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. Here we present an agent that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions. Trained via a combination of reinforcement and unsupervised learning, and beginning with minimal prior knowledge, the agent learns to relate linguistic symbols to emergent perceptual representations of its physical surroundings and to pertinent sequences of actions.