A Transformer-based Audio Captioning Model with Keyword Estimation