Understanding and Improving Length Generalization in Recurrent Models