cosine
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
0b3f44d9054402de39441e165a4bdfe0-Supplemental.pdf
Multiple versions of this dataset exist in the literature; we use the version by Ravi and Larochelle [43]. The original version of the dataset contains43images that are also present in ImageNet. We remove these duplicates to avoid overestimating the transfer capability during evaluation. VGGFlowers: Originally introduced by Nilsback and Zisserman[38], VGGFlowers consists of 102 flower categories with each category containing between40 and 258 images. A.3 Trainingalgorithms For the metric-based family, we use ProtoNet with Euclidean [53] and scaled negative cosine similarity measures [20].
Group Contrastive Learning for Weakly Paired Multimodal Data
Gorla, Aditya, Van Assel, Hugues, Huetter, Jan-Christian, Yao, Heming, Cho, Kyunghyun, Regev, Aviv, Littman, Russell
We present GROOVE, a semi-supervised multi-modal representation learning approach for high-content perturbation data where samples across modalities are weakly paired through shared perturbation labels but lack direct correspondence. Our primary contribution is GroupCLIP, a novel group-level contrastive loss that bridges the gap between CLIP for paired cross-modal data and SupCon for uni-modal supervised contrastive learning, addressing a fundamental gap in contrastive learning for weakly-paired settings. We integrate GroupCLIP with an on-the-fly backtranslating autoencoder framework to encourage cross-modally entangled representations while maintaining group-level coherence within a shared latent space. Critically, we introduce a comprehensive combinatorial evaluation framework that systematically assesses representation learners across multiple optimal transport aligners, addressing key limitations in existing evaluation strategies. This framework includes novel simulations that systematically vary shared versus modality-specific perturbation effects enabling principled assessment of method robustness. Our combinatorial benchmarking reveals that there is not yet an aligner that uniformly dominates across settings or modality pairs. Across simulations and two real single-cell genetic perturbation datasets, GROOVE performs on par with or outperforms existing approaches for downstream cross-modal matching and imputation tasks. Our ablation studies demonstrate that GroupCLIP is the key component driving performance gains. These results highlight the importance of leveraging group-level constraints for effective multi-modal representation learning in scenarios where only weak pairing is available.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
Correction of Decoupled Weight Decay
Decoupled weight decay, solely responsible for the performance advantage of AdamW over Adam, has long been set to proportional to learning rate γ without questioning. To the contrary, we find that eliminating the contribution of the perpendicular component of the update to the weight norm leads to little change to the training dynamics. For adaptive gradient methods such as SGD with momentum (Sutskever et al., 2013) and Adam (Kingma & Ba, 2015), weight decay is no longer equivalent to L Nevertheless, Defazio (2025) presents experiments on Llama 3 architecture (Grattafiori et al., 2024) in which most layers are not immediately followed by normalization. It states that "we consider every linear layer as normalized, excluding the output layer of the network" for the purpose of applying such corrected weight decay, and AdamC results in more stable weight and gradient norms than the AdamW baseline regardless. Consider the "Renormalized" AdamW optimizer above (Algorithm 1) which eliminates the contribution of u We train a variant of ViT -S/16 based on the setup described in Beyer et al. (2022) on the ImageNet-1k dataset (Russakovsky et al., 2015) for 90 epochs and instead observe almost no differences in relevant metrics (Figure 1).
- Asia > Middle East > Jordan (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.04)