Transformers Provably Learn Directed Acyclic Graphs via Kernel-Guided Mutual Information