Goto

Collaborating Authors

 simclr




Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings Appendix

Neural Information Processing Systems

We provide hyper-parameters of our models in Table A.1. Table A.1: Hyper-parameters used for training our VisualCSE and AudioCSE. Vision, we use Dropout augmentation (the same strategy in SimCSE) for AudioCSE. We compare unsup-SimCSE and unsup-VisualCSE on a small scale retrieval test. As shown in Table C.1, VisualCSE generally retrieves qualitatively different sentences than SimCSE.


ChangeEventDatasetforDiscoveryfrom Spatio-temporalRemoteSensingImagery

Neural Information Processing Systems

Thus, instead of simply detecting changed pixels, we want to identify change events. We define a change event as a group of pixels over space and time that are all changed by a single event. Weareinterested indeveloping systems thatcanautomatically detectchangeeventsandassign to each a semantic label that indicates the nature of the event, e.g., forest fires, road construction etc. Identifying change events is a much more challenging problem than change detection.





Appendix

Neural Information Processing Systems

Here are the five models that we used, in increasing order of adversarialrobustness: = 0,0.5,1.0,3.0,5.0. Three ImageNet-trained vision transformer (ViT) models [47] were obtained from pytorch-image-models [48]. Note that the "imagenet1k" suffixinthe model names does not mean the model wasonly trained on ImageNet1K. Observation: A vision transformer (ViT-S) indeed shows higher error consistency with ResNet-50 than with BagNet-9 (see Table 1). Further insights could be gained by testing successively more constrained versions of the samebasemodel.


f3ada80d5c4ee70142b17b8192b2958e-Supplemental.pdf

Neural Information Processing Systems

First, a random patch of the image is selected and resized to224 224 with a random horizontal flip, followed byacolor distortion, consisting ofarandom sequence ofbrightness, contrast, saturation, hue adjustments, and anoptional grayscale conversion. FinallyGaussian blur and solarization are appliedtothepatches. Optimization We use theLARS optimizer [70] with a cosine decay learning rate schedule [71], without restarts, over1000epochs, with awarm-up period of10epochs. Wesetthebase learning rate to 0.2, scaled linearly [72] with the batch size (LearningRate = 0.2 BatchSize/256). Forthetargetnetwork,the exponential moving average parameterτ starts fromτbase = 0.996and is increased to one during training.