How to Train Long-Context Language Models (Effectively)