Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration