Gemini Robotics uses Google's top language model to make robots more useful

MIT Technology Review 

Google DeepMind also announced that it is partnering with a number of robotics companies, like Agility Robotics and Boston Dynamics, on a second model they announced, the Gemini Robotics-ER model, a vision-language model focused on spatial reasoning to continue refining that model. "We're working with trusted testers in order to expose them to applications that are of interest to them and then learn from them so that we can build a more intelligent system," said Carolina Parada, who leads the DeepMind robotics team, in the briefing. Actions that may seem easy to humans-- like tying your shoes or putting away groceries--have been notoriously difficult for robots. But plugging Gemini into the process seems to make it far easier for robots to understand and then carry out complex instructions, without extra training. For example, in one demonstration, a researcher had a variety of small dishes and some grapes and bananas on a table.