Cascade-Aware Training of Language Models