Fine-Grained Retrieval-Augmented Generation for Visual Question Answering