kitani
CMU Helps Compile Largest Collection of First-Person Videos
Researchers at Carnegie Mellon University helped compile and will have access to the largest collection of point-of-view videos in the world. These videos could enable artificial intelligence to understand the world from a first-person point of view and unlock a new wave of virtual assistants, augmented reality and robotics. Until now, most of the video used to train computer vision models came from the third-person point of view. The first-person, or egocentric, video included in this collection will allow researchers to train computer vision systems to see the world as humans do. "For the first time, we'll have enough data to be able to teach computers to see what we see," said Kris Kitani, an associate research professor in the Robotics Institute who led CMU's efforts to collect data.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.40)
- Africa > Rwanda (0.06)
AutoSelect: Automatic and Dynamic Detection Selection for 3D Multi-Object Tracking
3D multi-object tracking is an important component in robotic perception systems such as self-driving vehicles. Recent work follows a tracking-by-detection pipeline, which aims to match past tracklets with detections in the current frame. To avoid matching with false positive detections, prior work filters out detections with low confidence scores via a threshold. However, finding a proper threshold is non-trivial, which requires extensive manual search via ablation study. Also, this threshold is sensitive to many factors such as target object category so we need to re-search the threshold if these factors change. To ease this process, we propose to automatically select high-quality detections and remove the efforts needed for manual threshold search. Also, prior work often uses a single threshold per data sequence, which is sub-optimal in particular frames or for certain objects. Instead, we dynamically search threshold per frame or per object to further boost performance. Through experiments on KITTI and nuScenes, our method can filter out $45.7\%$ false positives while maintaining the recall, achieving new S.O.T.A. performance and removing the need for manually threshold tuning.
- Transportation (0.46)
- Information Technology (0.46)
Artificial Intelligence Predicts a Picture's Future
Given a still image, a new artificial intelligence system can generate videos that simulate the future of that scene to predict what might happen next. Currently, these videos are less than two seconds long and can make people look like blobs. But researchers hope that in the future, more powerful versions of this system could help robots navigate homes and offices and also lead to safer self-driving cars. Computers have grown steadily better at recognizing faces and other items within images. However, they still have major problems envisioning how the scenes they see might change, given the virtually limitless number of ways that items within images can interact.
- Transportation > Ground (0.36)
- Information Technology (0.36)