Unlocking Multi-Modal Potentials for Dynamic Text-Attributed Graph Representation