AI that understands the world from a first-person point of view could unlock a new era of immersive experiences, as devices like augmented reality (AR) glasses and virtual reality (VR) headsets become as useful in everyday life as smartphones. Imagine your AR device displaying exactly how to hold the sticks during a drum lesson, guiding you through a recipe, helping you find your lost keys, or recalling memories as holograms that come to life in front of you. To build these new technologies, we need to teach AI to understand and interact with the world like we do, from a first-person perspective -- commonly referred to in the research community as egocentric perception. Today's computer vision (CV) systems, however, typically learn from millions of photos and videos that are captured in third-person perspective, where the camera is just a spectator to the action. "Next-generation AI systems will need to learn from an entirely different kind of data -- videos that show the world from the center of the action, rather than the sidelines," says Kristen Grauman, lead research scientist at Facebook.
Facebook is pouring a lot of time and money into augmented reality, including building its own AR glasses with Ray-Ban. Right now, these gadgets can only record and share imagery, but what does the company think such devices will be used for in the future? A new research project led by Facebook's AI team suggests the scope of the company's ambitions. It imagines AI systems that are constantly analyzing peoples' lives using first-person video; recording what they see, do, and hear in order to help them with everyday tasks. Facebook's researchers have outlined a series of skills it wants these systems to develop, including "episodic memory" (answering questions like "where did I leave my keys?") and "audio-visual diarization" (remembering who said what when).
For the last two years, Facebook AI Research (FAIR) has worked with 13 universities around the world to assemble the largest ever data set of first-person video--specifically to train deep-learning image-recognition models. AIs trained on the data set will be better at controlling robots that interact with people, or interpreting images from smart glasses. "Machines will be able to help us in our daily lives only if they really understand the world through our eyes," says Kristen Grauman at FAIR, who leads the project. Such tech could support people who need assistance around the home, or guide people in tasks they are learning to complete. "The video in this data set is much closer to how humans observe the world," says Michael Ryoo, a computer vision researcher at Google Brain and Stony Brook University in New York, who is not involved in Ego4D.
Facebook has announced a research project that aims to push the "frontier of first-person perception", and in the process help you remember where you left your keys. The Ego4D project provides a huge collection of first-person video and related data, plus a set of challenges for researchers to teach computers to understand the data and gather useful information from it. In September, the social media giant launched a line of "smart glasses" called Ray-Ban Stories, which carry a digital camera and other features. Much like the Google Glass project, which met mixed reviews in 2013, this one has prompted complaints of privacy invasion. Tickets to TNW Conference 2022 are available now!
To operate in augmented and virtual reality, Facebook believes artificial intelligence will need to develop an "egocentric perspective." To that end, the company on Thursday announced Ego4D, a data set of 2,792 hours of first-person video, and a set of benchmark tests for neural nets, designed to encourage the development of AI that is savvier about what it's like to move through virtual worlds from a first-person perspective. The project is a collaboration between Facebook Reality Labs and scholars from 13 research institutions, including academic institutions and research labs. The details are laid out in a paper lead-authored by Facebook's Kristen Grauman, "Ego4D: Around the World in 2.8K Hours of Egocentric Video." Grauman is a scientist with the company's Facebook AI Research unit.