AI that understands the world from a first-person point of view could unlock a new era of immersive experiences, as devices like augmented reality (AR) glasses and virtual reality (VR) headsets become as useful in everyday life as smartphones. Imagine your AR device displaying exactly how to hold the sticks during a drum lesson, guiding you through a recipe, helping you find your lost keys, or recalling memories as holograms that come to life in front of you. To build these new technologies, we need to teach AI to understand and interact with the world like we do, from a first-person perspective -- commonly referred to in the research community as egocentric perception. Today's computer vision (CV) systems, however, typically learn from millions of photos and videos that are captured in third-person perspective, where the camera is just a spectator to the action. "Next-generation AI systems will need to learn from an entirely different kind of data -- videos that show the world from the center of the action, rather than the sidelines," says Kristen Grauman, lead research scientist at Facebook.
Facebook announced a research project Thursday that aims to develop an artificial intelligence capable of perceiving the world like a human being. The project, titled Ego4D, aims to train an artificial intelligence (AI) to perceive the world in the first-person by analyzing a constant stream of video from people's lives. This type of data, which Facebook calls "egocentric" data, is designed to help the AI perceive, remember and plan like a human being. "Next-generation AI systems will need to learn from an entirely different kind of data -- videos that show the world from the center of the action, rather than the sidelines," Kristen Grauman, lead AI research scientist at Facebook, said in the announcement. The project aims to improve AI technology's capacity to accomplish human processes by setting five key benchmarks: "episodic memory," in which the AI ties memories to specific locations and times, "forecasting," "social interaction," "hand and object manipulation" and "audio-visual diarization," in which the AI ties auditory experiences to specific locations and times.
Facebook is pouring a lot of time and money into augmented reality, including building its own AR glasses with Ray-Ban. Right now, these gadgets can only record and share imagery, but what does the company think such devices will be used for in the future? A new research project led by Facebook's AI team suggests the scope of the company's ambitions. It imagines AI systems that are constantly analyzing peoples' lives using first-person video; recording what they see, do, and hear in order to help them with everyday tasks. Facebook's researchers have outlined a series of skills it wants these systems to develop, including "episodic memory" (answering questions like "where did I leave my keys?") and "audio-visual diarization" (remembering who said what when).
Facebook has announced a research project that aims to push the "frontier of first-person perception", and in the process help you remember where you left your keys. The Ego4D project provides a huge collection of first-person video and related data, plus a set of challenges for researchers to teach computers to understand the data and gather useful information from it. In September, the social media giant launched a line of "smart glasses" called Ray-Ban Stories, which carry a digital camera and other features. Much like the Google Glass project, which met mixed reviews in 2013, this one has prompted complaints of privacy invasion. Tickets to TNW Conference 2022 are available now!
To operate in augmented and virtual reality, Facebook believes artificial intelligence will need to develop an "egocentric perspective." To that end, the company on Thursday announced Ego4D, a data set of 2,792 hours of first-person video, and a set of benchmark tests for neural nets, designed to encourage the development of AI that is savvier about what it's like to move through virtual worlds from a first-person perspective. The project is a collaboration between Facebook Reality Labs and scholars from 13 research institutions, including academic institutions and research labs. The details are laid out in a paper lead-authored by Facebook's Kristen Grauman, "Ego4D: Around the World in 2.8K Hours of Egocentric Video." Grauman is a scientist with the company's Facebook AI Research unit.