Communication breakdown: On the low mutual intelligibility between human and neural captioning