Fine-grained Early Frequency Attention for Deep Speaker Representation Learning