Gato, the latest from Deepmind. Towards true AI?


The deep learning field is progressing rapidly, and the latest work from Deepmind is a good example of this. Their Gato model is able to learn to play Atari games, generate realistic text, process images, control robotic arms, and more, all with the same neural network. Inspired by large-scale language models, Deepmind applied a similar approach but extended beyond the realm of text outputs. This new AGI (after Artificial General Intelligence) works as a multi-modal, multi-task, multi-embodiment network, which means that the same network (i.e. a single architecture with a single set of weights) can perform all tasks, despite involving inherently different kinds of inputs and outputs. While Deepmind's preprint presenting Gato is not very detailed, it is clear enough in that it is strongly rooted in transformers as used for natural language processing and text generation.

