Video-VoT-R1: An efficient video inference model integrating image packing and AoE architecture