Rethinking and Improving Multi-task Learning for End-to-end Speech Translation