Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image Classification