CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation