Multi-modal Learning with Prior Visual Relation Reasoning