Parallelizing non-linear sequential models over the sequence length