Disentangling Textual and Acoustic Features of Neural Speech Representations