Appendix: Training Transitive and Commutative Multimodal Transformers with LoReTTa Manuel Tran

Oct-9-2025, 00:23:31 GMT–Neural Information Processing Systems

In our SVL-MNIST experiments, we freeze the backbone and train a linear classifier on top. The initial learning rate is 0.1, but it We do not use weight decay. It is particularly effective for problems with many features. We divide the SVL-MNIST dataset into training, validation, and test sets (Figure A1). The first dataset (I, T) consists of 12,000 paired samples from MNIST and WineReviews.

artificial intelligence, machine learning, modality, (14 more...)

Neural Information Processing Systems

Oct-9-2025, 00:23:31 GMT

Conferences PDF

Add feedback

Country:
- Europe > Germany
  - Bavaria > Upper Bavaria > Munich (0.05)
- Asia > Middle East
  - Jordan (0.05)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
860a092bb4d9d81d3133a01c50c01578-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found