Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach