Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction