Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning