Representation Learning for Compressed Video Action Recognition via Attentive Cross-modal Interaction with Motion Enhancement