Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning