VHASR: A Multimodal Speech Recognition System With Vision Hotwords