Finding the Pillars of Strength for Multi-Head Attention