Controllable Audio-Visual Viewpoint Generation from 360° Spatial Information