Spatially Visual Perception for End-to-End Robotic Learning