Enhancing image captioning with depth information using a Transformer-based framework