Building competitive direct acoustics-to-word models for English conversational speech recognition