Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads