M2H: Multi-Task Learning with Efficient Window-Based Cross-Task Attention for Monocular Spatial Perception
Udugama, U. V. B. L, Vosselman, George, Nex, Francesco
–arXiv.org Artificial Intelligence
Deploying real-time spatial perception on edge devices requires efficient multi-task models that leverage complementary task information while minimizing computational overhead. This paper introduces Multi-Mono-Hydra (M2H), a novel multi-task learning framework designed for semantic segmentation and depth, edge, and surface normal estimation from a single monocular image. Unlike conventional approaches that rely on independent single-task models or shared encoder-decoder architectures, M2H introduces a Window-Based Cross-Task Attention Module that enables structured feature exchange while preserving task-specific details, improving prediction consistency across tasks. Built on a lightweight ViT-based DINOv2 backbone, M2H is optimized for real-time deployment and serves as the foundation for monocular spatial perception systems supporting 3D scene graph construction in dynamic environments. Comprehensive evaluations show that M2H outperforms state-of-the-art multi-task models on NYUDv2, surpasses single-task depth and semantic baselines on Hypersim, and achieves superior performance on the Cityscapes dataset, all while maintaining computational efficiency on laptop hardware. Beyond benchmarks, M2H is validated on real-world data, demonstrating its practicality in spatial perception tasks.
arXiv.org Artificial Intelligence
Oct-21-2025
- Country:
- Europe
- Belgium > Flanders
- Flemish Brabant > Leuven (0.04)
- Germany (0.04)
- Netherlands (0.04)
- Switzerland > Zürich
- Zürich (0.05)
- Belgium > Flanders
- North America > United States
- Utah > Salt Lake County > Salt Lake City (0.04)
- Europe
- Genre:
- Research Report (0.40)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.68)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence