Learning to Separate Voices by Spatial Regions