Exploiting Multi-Modal Interactions: A Unified Framework