Mutual Information Scaling and Expressive Power of Sequence Models