A Neural Compositional Paradigm for Image Captioning