Audio-Conditioned U-Net for Position Estimation in Full Sheet Images