Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision

Zhou, Rulin, He, Wenlong, Wang, An, Yao, Qiqi, Hu, Haijun, Wang, Jiankun, Ren, Xi Zhang an Hongliang

Mar-28-2025–arXiv.org Artificial Intelligence

Accurate tissue point tracking in endoscopic videos is critical for robotic-assisted surgical navigation and scene understanding, but remains challenging due to complex deformations, instrument occlusion, and the scarcity of dense trajectory annotations. Existing methods struggle with long-term tracking under these conditions due to limited feature utilization and annotation dependence. We present Endo-TTAP, a novel framework addressing these challenges through: (1) A Multi-Facet Guided Attention (MFGA) module that synergizes multi-scale flow dynamics, DINOv2 semantic embeddings, and explicit motion patterns to jointly predict point positions with uncertainty and occlusion awareness; (2) A two-stage curriculum learning strategy employing an Auxiliary Curriculum Adapter (ACA) for progressive initialization and hybrid supervision. Stage I utilizes synthetic data with optical flow ground truth for uncertainty-occlusion regularization, while Stage II combines unsupervised flow consistency and semi-supervised learning with refined pseudo-labels from off-the-shelf trackers. Extensive validation on two MICCAI Challenge datasets and our collected dataset demonstrates that Endo-TTAP achieves state-of-the-art performance in tissue point tracking, particularly in scenarios characterized by complex endoscopic conditions. The source code and dataset will be available at https://anonymous.4open.science/r/Endo-TTAP-36E5.

artificial intelligence, machine learning, survey article, (16 more...)

arXiv.org Artificial Intelligence

Mar-28-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.30)

Genre:
- Overview (0.46)
- Research Report (0.64)

Industry:
- Health & Medicine > Surgery (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (0.46)
    - Unsupervised or Indirectly Supervised Learning (0.34)
  - Vision (1.00)