Head-wise Shareable Attention for Large Language Models