Toward a machine learning model that can reason about everyday actions
The ability to reason abstractly about events as they unfold is a defining feature of human intelligence. We know instinctively that crying and writing are means of communicating, and that a panda falling from a tree and a plane landing are variations on descending. Organizing the world into abstract categories does not come easily to computers, but in recent years researchers have inched closer by training machine learning models on words and images infused with structural information about the world, and how objects, animals, and actions relate. In a new study at the European Conference on Computer Vision this month, researchers unveiled a hybrid language-vision model that can compare and contrast a set of dynamic events captured on video to tease out the high-level concepts connecting them. Their model did as well as or better than humans at two types of visual reasoning tasks--picking the video that conceptually best completes the set, and picking the video that doesn't fit.
Sep-1-2020, 23:31:06 GMT
- Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Genre:
- Research Report (0.51)
- Technology: