Attention networks for image-to-text