Multiscale sequence modeling with a learned dictionary