Holistic Surgical Phase Recognition with Hierarchical Input Dependent State Space Models
Wu, Haoyang, Wang, Tsun-Hsuan, Lechner, Mathias, Hasani, Ramin, Eckhoff, Jennifer A., Pak, Paul, Meireles, Ozanan R., Rosman, Guy, Ban, Yutong, Rus, Daniela
–arXiv.org Artificial Intelligence
-- Surgical workflow analysis is essential in robot-assisted surgeries, yet the long duration of such procedures poses significant challenges for comprehensive video analysis. Recent approaches have predominantly relied on transformer models; however, their quadratic attention mechanism restricts efficient processing of lengthy surgical videos. In this paper, we propose a novel hierarchical input-dependent state space model that leverages the linear scaling property of state space models to enable decision making on full-length videos while capturing both local and global dynamics. Our framework incorporates a temporally consistent visual feature extractor, which appends a state space model head to a visual feature extractor to propagate temporal information. The proposed model consists of two key modules: a local-aggregation state space model block that effectively captures intricate local dynamics, and a global-relation state space model block that models temporal dependencies across the entire video. The model is trained using a hybrid discrete-continuous supervision strategy, where both signals of discrete phase labels and continuous phase progresses are propagated through the network. Experiments have shown that our method outperforms the current state-of-the-art methods by a large margin (+2.8% on Cholec80, +4.3% on MICCAI2016, and +12.9% on Heichole datasets). Code will be publically available after paper acceptance.
arXiv.org Artificial Intelligence
Jun-27-2025
- Country:
- Asia > China
- Europe > Germany (0.04)
- North America > United States
- Massachusetts
- Middlesex County > Cambridge (0.05)
- Suffolk County > Boston (0.04)
- Massachusetts
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine
- Diagnostic Medicine > Imaging (0.47)
- Surgery (0.86)
- Health & Medicine
- Technology: