Enhancing image captioning with depth information using a Transformer-based framework

Open in new window