Top-Down Semantic Refinement for Image Captioning