Goto

Collaborating Authors

 Supervised Learning




A.1 ThePólya-Gammaaugmentation A random variableω has a Pólya-Gamma distribution if it can be written as an infinite sum of independentgammarandomvariables: ω D = 1 2π2 X

Neural Information Processing Systems

GivenatrainingdatasetD =(X,y)offeaturesandcorresponding labels from {1, ..., T} classes,D is partitioned recursively to two subsets, according to classes, at each tree level until reaching leaf nodes with data from only one class. More concretely, initially, feature vectors for all samples are obtained (using a NN), then a class prototype is generated by averaging the feature vectors belonging to the same class for all classes.


45f31d16b1058d586fc3be7207b58053-Paper.pdf

Neural Information Processing Systems

We show that the matrix perspective function, which is jointly convex in the Cartesian product of a standard Euclidean vector space and a conformal space of symmetric matrices, has a proximity operator in an almost closed form.




Metric space valued Fréchet regression

Györfi, László, Humbert, Pierre, Bars, Batiste Le

arXiv.org Machine Learning

We consider the problem of estimating the Fréchet and conditional Fréchet mean from data taking values in separable metric spaces. Unlike Euclidean spaces, where well-established methods are available, there is no practical estimator that works universally for all metric spaces. Therefore, we introduce a computable estimator for the Fréchet mean based on random quantization techniques and establish its universal consistency across any separable metric spaces. Additionally, we propose another estimator for the conditional Fréchet mean, leveraging data-driven partitioning and quantization, and demonstrate its universal consistency when the output space is any Banach space.



CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss

Neural Information Processing Systems

This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-trained model in one modality is used for representation learning in another domain using pairwise data. The learnt models in the latter domain can then be used for a diverse set of tasks in a 0-shot way, similar to Contrastive Language-Image Pre-training (CLIP) and Locked-image Tuning (LiT) that have recently gained considerable attention. Classical contrastive training employs sets of positive and negative examples to align similar and repel dissimilar training data samples. However, similarity amongst training examples has a more continuous nature, thus calling for a more `non-binary' treatment. To address this, we propose a new contrastive loss function called Continuously Weighted Contrastive Loss (CWCL) that employs a continuous measure of similarity. With CWCL, we seek to transfer the structure of the embedding space from one modality to another. Owing to the continuous nature of similarity in the proposed loss function, these models outperform existing methods for 0-shot transfer across multiple models, datasets and modalities. By using publicly available datasets, we achieve 5-8% (absolute) improvement over previous state-of-the-art methods in 0-shot image classification and 20-30% (absolute) improvement in 0-shot speech-to-intent classification and keyword classification.


Lifting Weak Supervision To Structured Prediction

Neural Information Processing Systems

Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from various sources. WS is theoretically well-understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for \emph{structured prediction}, where the output space consists of more than a binary or multi-class label set: e.g.