Contrastive Learning-based Multi Modal Architecture for Emoticon Prediction by Employing Image-Text Pairs