Thinking Inside the Box: A Comprehensive Spatial Representation for Video Analysis