Goto

Collaborating Authors

 temporal head


Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

Park, Yein, Yoon, Chanwoong, Park, Jungwoo, Jeong, Minbyul, Kang, Jaewoo

arXiv.org Artificial Intelligence

While the ability of language models to elicit facts has been widely investigated, how they handle temporally changing facts remains underexplored. We discover Temporal Heads, specific attention heads primarily responsible for processing temporal knowledge through circuit analysis. We confirm that these heads are present across multiple models, though their specific locations may vary, and their responses differ depending on the type of knowledge and its corresponding years. Disabling these heads degrades the model's ability to recall time-specific knowledge while maintaining its general capabilities without compromising time-invariant and question-answering performances. Moreover, the heads are activated not only numeric conditions ("In 2004") but also textual aliases ("In the year ..."), indicating that they encode a temporal dimension beyond simple numerical representation. Furthermore, we expand the potential of our findings by demonstrating how temporal knowledge can be edited by adjusting the values of these heads.


Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity

Xi, Haocheng, Yang, Shuo, Zhao, Yilong, Xu, Chenfeng, Li, Muyang, Li, Xiuyu, Lin, Yujun, Cai, Han, Zhang, Jintao, Li, Dacheng, Chen, Jianfei, Stoica, Ion, Keutzer, Kurt, Han, Song

arXiv.org Artificial Intelligence

Diffusion Transformers (DiTs) dominate video generation but their high computational cost severely limits real-world applicability, usually requiring tens of minutes to generate a few seconds of video even on high-performance GPUs. This inefficiency primarily arises from the quadratic computational complexity of 3D Full Attention with respect to the context length. In this paper, we propose a training-free framework termed Sparse VideoGen (SVG) that leverages the inherent sparsity in 3D Full Attention to boost inference efficiency. We reveal that the attention heads can be dynamically classified into two groups depending on distinct sparse patterns: (1) Spatial Head, where only spatially-related tokens within each frame dominate the attention output, and (2) Temporal Head, where only temporally-related tokens across different frames dominate. Based on this insight, SVG proposes an online profiling strategy to capture the dynamic sparse patterns and predicts the type of attention head. Combined with a novel hardware-efficient tensor layout transformation and customized kernel implementations, SVG achieves up to 2.28x and 2.33x end-to-end speedup on CogVideoX-v1.5 and HunyuanVideo, respectively, while preserving generation quality.