Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks