Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling