Expressive and Scalable Quantum Fusion for Multimodal Learning