Pre-trained Large Language Models Learn to Predict Hidden Markov Models In-context

Neural Information Processing Systems 

Hidden Markov Models (HMMs) are fundamental tools for modeling sequential data with latent states that follow Markovian dynamics. However, they present significant challenges in model fitting and computational efficiency on real-world datasets. In this work, we demonstrate that pre-trained large language models (LLMs) can effectively model data generated by HMMs through in-context learning (ICL) -- their ability to learn patterns from examples within the input context. We evaluate LLMs' performance on diverse synthetic HMMs, showing that their prediction accuracy converges to the theoretical optimum. We discover novel scaling trends influenced by HMM properties and provide theoretical conjectures for these empirical observations.