Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI