Self-Supervised Generation of Spatial Audio for 360° Video