Transformers as Multi-task Learners: Decoupling Features in Hidden Markov Models