Goto

Collaborating Authors

 dino



LMC: Large Model Collaboration with Cross-assessment for Training-Free Open-Set Object Recognition (Supplementary Material)

Neural Information Processing Systems

In Figure 1, we compare our LMC framework with the baseline Softmax, and present qualitative results on the TinyImageNet dataset. Below, we discuss them in more detail. AUROC is a widely-used threshold-independent evaluation metric. Both authors contributed equally to the work. Before entering the inference process, similar to our framework, Softmax also pre-stores certain CLIP and DINO features to make the inference process more efficient.




AdaptingSelf-SupervisedVisionTransformersby ProbingAttention-ConditionedMaskingConsistency

Neural Information Processing Systems

Similarly, self-supervised representation learning (SSL) is rapidly replacing supervised learning as the de-facto pretraining strategy for deep networks, due to improved scalability (unlabeled data is easier to collect) and generality (domain-specific SSL is often preferable to one-fits-all ImageNet pretraining [16,17]).





Not Quite Anything: Overcoming SAMs Limitations for 3D Medical Imaging

Moore, Keith

arXiv.org Artificial Intelligence

Foundation segmentation models such as SAM and SAM-2 perform well on natural images but struggle with brain MRIs where structures like the caudate and thalamus lack sharp boundaries and have low contrast. Rather than fine tune these models (for example MedSAM), we propose a compositional alternative where the foundation model output is treated as an additional input channel and passed alongside the MRI to highlight regions of interest. We generate SAM-2 prompts by using a lightweight 3D U-Net that was previously trained on MRI segmentation. The U-Net may have been trained on a different dataset, so its guesses are often imprecise but usually in the correct region. The edges of the resulting foundation model guesses are smoothed to improve alignment with the MRI. We also test prompt free segmentation using DINO attention maps in the same framework. This has-a architecture avoids modifying foundation weights and adapts to domain shift without retraining the foundation model. It reaches about 96 percent volume accuracy on basal ganglia segmentation, which is sufficient for our study of longitudinal volume change. The approach is fast, label efficient, and robust to out of distribution scans. We apply it to study inflammation linked changes in sudden onset pediatric OCD.


Cross-Domain Few-Shot Learning with Coalescent Projections and Latent Space Reservation

Paeedeh, Naeem, Pratama, Mahardhika, Kamal, Imam Mustafa, Mayer, Wolfgang, Cao, Jimmy, Kowlczyk, Ryszard

arXiv.org Artificial Intelligence

Despite the progress in cross-domain few-shot learning, a model pre-trained with DINO combined with a prototypical classifier outperforms the latest SOTA methods. A crucial limitation that needs to be overcome is that updating too many parameters of the transformers leads to overfitting due to the scarcity of labeled samples. T o address this challenge, we propose a new concept, coalescent projection, as an effective successor to soft prompts. Additionally, we propose a novel pseudo-class generation method, combined with self-supervised transformations, that relies solely on the base domain to prepare the network to encounter unseen samples from different domains. The proposed method exhibits its effectiveness in comprehensive experiments on the extreme domain-shift problem of the BSCD-FSL benchmark.