simclr
Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings Appendix
We provide hyper-parameters of our models in Table A.1. Table A.1: Hyper-parameters used for training our VisualCSE and AudioCSE. Vision, we use Dropout augmentation (the same strategy in SimCSE) for AudioCSE. We compare unsup-SimCSE and unsup-VisualCSE on a small scale retrieval test. As shown in Table C.1, VisualCSE generally retrieves qualitatively different sentences than SimCSE.
ChangeEventDatasetforDiscoveryfrom Spatio-temporalRemoteSensingImagery
Thus, instead of simply detecting changed pixels, we want to identify change events. We define a change event as a group of pixels over space and time that are all changed by a single event. Weareinterested indeveloping systems thatcanautomatically detectchangeeventsandassign to each a semantic label that indicates the nature of the event, e.g., forest fires, road construction etc. Identifying change events is a much more challenging problem than change detection.
Appendix
Here are the five models that we used, in increasing order of adversarialrobustness: = 0,0.5,1.0,3.0,5.0. Three ImageNet-trained vision transformer (ViT) models [47] were obtained from pytorch-image-models [48]. Note that the "imagenet1k" suffixinthe model names does not mean the model wasonly trained on ImageNet1K. Observation: A vision transformer (ViT-S) indeed shows higher error consistency with ResNet-50 than with BagNet-9 (see Table 1). Further insights could be gained by testing successively more constrained versions of the samebasemodel.