Probing Cross-modal Semantics Alignment Capability from the Textual Perspective

Open in new window