VET-DINO: Learning Anatomical Understanding Through Multi-View Distillation in Veterinary Imaging
Dourson, Andre, Taylor, Kylie, Qiao, Xiaoli, Fitzke, Michael
–arXiv.org Artificial Intelligence
Self-supervised learning has emerged as a powerful paradigm for training deep neural networks, particularly in medical imaging where labeled data is scarce. While current approaches typically rely on synthetic augmentations of single images, we propose VET-DINO, a framework that leverages a unique characteristic of medical imaging: the availability of multiple standardized views from the same study. Using a series of clinical veterinary radiographs from the same patient study, we enable models to learn view-invariant anatomical structures and develop an implied 3D understanding from 2D projections. We demonstrate our approach on a dataset of 5 million veterinary radiographs from 668,000 canine studies. Through extensive experimentation, including view synthesis and downstream task performance, we show that learning from real multi-view pairs leads to superior anatomical understanding compared to purely synthetic augmentations. VET-DINO achieves state-of-the-art performance on various veterinary imaging tasks. Our work establishes a new paradigm for self-supervised learning in medical imaging that leverages domain-specific properties rather than merely adapting natural image techniques.
arXiv.org Artificial Intelligence
May-22-2025
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Technology: