Global Minimizers of Sigmoid Contrastive Loss
–Neural Information Processing Systems
The meta-task of obtaining and aligning representations through contrastive pretraining is steadily gaining importance since its introduction in CLIP and ALIGN. In this paper we theoretically explain the advantages of synchronizing with trainable inverse temperature and bias under the sigmoid loss, as implemented in the recent SigLIP and SigLIP2 models of Google DeepMind. Temperature and bias can drive the loss function to zero for a rich class of configurations that we call $(\mathsf{m}, \mathsf{br})$ -Constellations.
Neural Information Processing Systems
Jun-13-2026, 01:06:00 GMT
- Technology: