Long-Form Speech Generation with Spoken Language Models