Approximating How Single Head Attention Learns