Efficient Quantification of Multimodal Interaction at Sample Level