T-EMDE: Sketching-based global similarity for cross-modal retrieval