Scene-Centric Joint Parsing of Cross-View Videos