Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning