Inductive Learning
DINOv3
Simรฉoni, Oriane, Vo, Huy V., Seitzer, Maximilian, Baldassarre, Federico, Oquab, Maxime, Jose, Cijo, Khalidov, Vasil, Szafraniec, Marc, Yi, Seungeun, Ramamonjisoa, Michaรซl, Massa, Francisco, Haziza, Daniel, Wehrstedt, Luca, Wang, Jianyuan, Darcet, Timothรฉe, Moutakanni, Thรฉo, Sentana, Leonel, Roberts, Claire, Vedaldi, Andrea, Tolan, Jamie, Brandt, John, Couprie, Camille, Mairal, Julien, Jรฉgou, Hervรฉ, Labatut, Patrick, Bojanowski, Piotr
Self-supervised learning holds the promise of eliminating the need for manual data annotation, enabling models to scale effortlessly to massive datasets and larger architectures. By not being tailored to specific tasks or domains, this training paradigm has the potential to learn visual representations from diverse sources, ranging from natural to aerial images -- using a single algorithm. This technical report introduces DINOv3, a major milestone toward realizing this vision by leveraging simple yet effective strategies. First, we leverage the benefit of scaling both dataset and model size by careful data preparation, design, and optimization. Second, we introduce a new method called Gram anchoring, which effectively addresses the known yet unsolved issue of dense feature maps degrading during long training schedules. Finally, we apply post-hoc strategies that further enhance our models' flexibility with respect to resolution, model size, and alignment with text. As a result, we present a versatile vision foundation model that outperforms the specialized state of the art across a broad range of settings, without fine-tuning. DINOv3 produces high-quality dense features that achieve outstanding performance on various vision tasks, significantly surpassing previous self- and weakly-supervised foundation models. We also share the DINOv3 suite of vision models, designed to advance the state of the art on a wide spectrum of tasks and data by providing scalable solutions for diverse resource constraints and deployment scenarios.
EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision
Pyeon, Myeongjang, Lee, Janghyeon, Lee, Minsoo, Yun, Juseung, Choi, Hwanil, Kim, Jonghyun, Kim, Jiwon, Hu, Yi, Jang, Jongseong, Lee, Soonyoung
In digital pathology, whole-slide images (WSIs) are often difficult to handle due to their gigapixel scale, so most approaches train patch encoders via self-supervised learning (SSL) and then aggregate the patch-level embeddings via multiple instance learning (MIL) or slide encoders for downstream tasks. However, patch-level SSL may overlook complex domain-specific features that are essential for biomarker prediction, such as mutation status and molecular characteristics, as SSL methods rely only on basic augmentations selected for natural image domains on small patch-level area. Moreover, SSL methods remain less data efficient than fully supervised approaches, requiring extensive computational resources and datasets to achieve competitive performance. To address these limitations, we present EXAONE Path 2.0, a pathology foundation model that learns patch-level representations under direct slide-level supervision. Using only 37k WSIs for training, EXAONE Path 2.0 achieves state-of-the-art average performance across 10 biomarker prediction tasks, demonstrating remarkable data efficiency.