To operate in augmented and virtual reality, Facebook believes artificial intelligence will need to develop an "egocentric perspective." To that end, the company on Thursday announced Ego4D, a data set of 2,792 hours of first-person video, and a set of benchmark tests for neural nets, designed to encourage the development of AI that is savvier about what it's like to move through virtual worlds from a first-person perspective. The project is a collaboration between Facebook's Facebook Reality Labs, in conjunction with scholars from thirteen research institutions, including academic institutions and research labs. The details of the work are laid out in a paper lead authored by Facebook's Kristen Grauman, "Ego4D: Around the World in 2.8K Hours of Egocentric Video." Grauman is a research scientist with the company's Facebook AI Research unit.
AI that understands the world from a first-person point of view could unlock a new era of immersive experiences, as devices like augmented reality (AR) glasses and virtual reality (VR) headsets become as useful in everyday life as smartphones. Imagine your AR device displaying exactly how to hold the sticks during a drum lesson, guiding you through a recipe, helping you find your lost keys, or recalling memories as holograms that come to life in front of you. To build these new technologies, we need to teach AI to understand and interact with the world like we do, from a first-person perspective -- commonly referred to in the research community as egocentric perception. Today's computer vision (CV) systems, however, typically learn from millions of photos and videos that are captured in third-person perspective, where the camera is just a spectator to the action. "Next-generation AI systems will need to learn from an entirely different kind of data -- videos that show the world from the center of the action, rather than the sidelines," says Kristen Grauman, lead research scientist at Facebook.
Facebook announced on Thursday that it is creating an artificial intelligence capable of viewing and interacting with the outside world the same way a person can. Known as the Ego4D project, the AI project will take the technology to the next level and have it learn from'videos from the center of action,' the social networking giant said in a blog post. The project is comprised of 13 universities and has collected more than 2,200 hours of first-person video from 700 people. It is going to use video and audio from augmented reality and virtual reality devices like its Ray-Bans sunglasses, which were announced last month, or its Oculus VR headsets. The Ego4D project will let AI learn from'videos from the center of action' 'AI that understands the world from this point of view could unlock a new era of immersive experiences, as devices like augmented reality (AR) glasses and virtual reality (VR) headsets become as useful in everyday life as smartphones,' the company said in the post.
Facebook has announced a research project that aims to push the "frontier of first-person perception", and in the process help you remember where you left your keys. The Ego4D project provides a huge collection of first-person video and related data, plus a set of challenges for researchers to teach computers to understand the data and gather useful information from it. In September, the social media giant launched a line of "smart glasses" called Ray-Ban Stories, which carry a digital camera and other features. Much like the Google Glass project, which met mixed reviews in 2013, this one has prompted complaints of privacy invasion. Tickets to TNW Conference 2022 are available now!
Facebook is pouring a lot of time and money into augmented reality, including building its own AR glasses with Ray-Ban. Right now, these gadgets can only record and share imagery, but what does the company think such devices will be used for in the future? A new research project led by Facebook's AI team suggests the scope of the company's ambitions. It imagines AI systems that are constantly analyzing peoples' lives using first-person video; recording what they see, do, and hear in order to help them with everyday tasks. Facebook's researchers have outlined a series of skills it wants these systems to develop, including "episodic memory" (answering questions like "where did I leave my keys?") and "audio-visual diarization" (remembering who said what when).