Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering