Deep Science: Vision plus language could yield capable AI – TechCrunch
Depending on the theory of intelligence to which you subscribe, achieving "human-level" AI will require a system that can leverage multiple modalities -- e.g., sound, vision and text -- to reason about the world. For example, when shown an image of a toppled truck and a police cruiser on a snowy freeway, a human-level AI might infer that dangerous road conditions caused an accident. Or, running on a robot, when asked to grab a can of soda from the refrigerator, they'd navigate around people, furniture and pets to retrieve the can and place it within reach of the requester. But new research shows signs of encouraging progress, from robots that can figure out steps to satisfy basic commands (e.g., "get a water bottle") to text-producing systems that learn from explanations. In this revived edition of Deep Science, our weekly series about the latest developments in AI and the broader scientific field, we're covering work out of DeepMind, Google and OpenAI that makes strides toward systems that can -- if not perfectly understand the world -- solve narrow tasks like generating images with impressive robustness.
Apr-10-2022, 17:18:00 GMT
- Technology: