Understanding Multimodal Deep Neural Networks: A Concept Selection View