Towards Matching Phones and Speech Representations