Goto

Collaborating Authors

 dino



LMC: Large Model Collaboration with Cross-assessment for Training-Free Open-Set Object Recognition (Supplementary Material)

Neural Information Processing Systems

In Figure 1, we compare our LMC framework with the baseline Softmax, and present qualitative results on the TinyImageNet dataset. Below, we discuss them in more detail. AUROC is a widely-used threshold-independent evaluation metric. Both authors contributed equally to the work. Before entering the inference process, similar to our framework, Softmax also pre-stores certain CLIP and DINO features to make the inference process more efficient.




AdaptingSelf-SupervisedVisionTransformersby ProbingAttention-ConditionedMaskingConsistency

Neural Information Processing Systems

Similarly, self-supervised representation learning (SSL) is rapidly replacing supervised learning as the de-facto pretraining strategy for deep networks, due to improved scalability (unlabeled data is easier to collect) and generality (domain-specific SSL is often preferable to one-fits-all ImageNet pretraining [16,17]).





SupplementaryMaterialsforthePaper" Towards Free DataSelectionwithGeneral-PurposeModels " AnonymousAuthor(s) Affiliation Address email

Neural Information Processing Systems

The detailed spectral clustering9 algorithm is shown in Alg. 1. This spectral clustering algorithm should be inserted into line 7 of10 Alg.1inourmainpaper.11 Interestingly, these two feature clustering strategies lead to similar data16 selection performance on PASCALVOC [7] object detection task. In this part, we pay attention to the effect of pretraining on the final performance of FreeSel. Randaugment: Practical automated data124 augmentation with areduced search space.


Not Quite Anything: Overcoming SAMs Limitations for 3D Medical Imaging

Moore, Keith

arXiv.org Artificial Intelligence

Foundation segmentation models such as SAM and SAM-2 perform well on natural images but struggle with brain MRIs where structures like the caudate and thalamus lack sharp boundaries and have low contrast. Rather than fine tune these models (for example MedSAM), we propose a compositional alternative where the foundation model output is treated as an additional input channel and passed alongside the MRI to highlight regions of interest. We generate SAM-2 prompts by using a lightweight 3D U-Net that was previously trained on MRI segmentation. The U-Net may have been trained on a different dataset, so its guesses are often imprecise but usually in the correct region. The edges of the resulting foundation model guesses are smoothed to improve alignment with the MRI. We also test prompt free segmentation using DINO attention maps in the same framework. This has-a architecture avoids modifying foundation weights and adapts to domain shift without retraining the foundation model. It reaches about 96 percent volume accuracy on basal ganglia segmentation, which is sufficient for our study of longitudinal volume change. The approach is fast, label efficient, and robust to out of distribution scans. We apply it to study inflammation linked changes in sudden onset pediatric OCD.