Beyond RNNs: Benchmarking Attention-Based Image Captioning Models