Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation