Enhancing XR Auditory Realism via Multimodal Scene-Aware Acoustic Rendering