Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering