Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models