Towards A Unified Neural Architecture for Visual Recognition and Reasoning