How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization