Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts