Interpretable Visual Understanding with Cognitive Attention Network