Transformers as Multi-task Learners: Decoupling Features in Hidden Markov Models

Open in new window