Towards Unsupervised Speech Recognition at the Syllable-Level