pairwise loss
Three Things to Know about Deep Metric Learning
Patel, Yash, Tolias, Giorgos, Matas, Jiri
This paper addresses supervised deep metric learning for open-set image retrieval, focusing on three key aspects: the loss function, mixup regularization, and model initialization. In deep metric learning, optimizing the retrieval evaluation metric, recall@k, via gradient descent is desirable but challenging due to its non-differentiable nature. To overcome this, we propose a differentiable surrogate loss that is computed on large batches, nearly equivalent to the entire training set. This computationally intensive process is made feasible through an implementation that bypasses the GPU memory limitations. Additionally, we introduce an efficient mixup regularization technique that operates on pairwise scalar similarities, effectively increasing the batch size even further. The training process is further enhanced by initializing the vision encoder using foundational models, which are pre-trained on large-scale datasets. Through a systematic study of these components, we demonstrate that their synergy enables large models to nearly solve popular benchmarks.
A Self-boosted Framework for Calibrated Ranking
Zhang, Shunyu, Liu, Hu, Bao, Wentian, Yu, Enyun, Song, Yang
Scale-calibrated ranking systems are ubiquitous in real-world applications nowadays, which pursue accurate ranking quality and calibrated probabilistic predictions simultaneously. For instance, in the advertising ranking system, the predicted click-through rate (CTR) is utilized for ranking and required to be calibrated for the downstream cost-per-click ads bidding. Recently, multi-objective based methods have been wildly adopted as a standard approach for Calibrated Ranking, which incorporates the combination of two loss functions: a pointwise loss that focuses on calibrated absolute values and a ranking loss that emphasizes relative orderings. However, when applied to industrial online applications, existing multi-objective CR approaches still suffer from two crucial limitations. First, previous methods need to aggregate the full candidate list within a single mini-batch to compute the ranking loss. Such aggregation strategy violates extensive data shuffling which has long been proven beneficial for preventing overfitting, and thus degrades the training effectiveness. Second, existing multi-objective methods apply the two inherently conflicting loss functions on a single probabilistic prediction, which results in a sub-optimal trade-off between calibration and ranking. To tackle the two limitations, we propose a Self-Boosted framework for Calibrated Ranking (SBCR).
Contrasting Multiple Representations with the Multi-Marginal Matching Gap
Piran, Zoe, Klein, Michal, Thornton, James, Cuturi, Marco
Learning meaningful representations of complex objects that can be seen through multiple ($k\geq 3$) views or modalities is a core task in machine learning. Existing methods use losses originally intended for paired views, and extend them to $k$ views, either by instantiating $\tfrac12k(k-1)$ loss-pairs, or by using reduced embeddings, following a \textit{one vs. average-of-rest} strategy. We propose the multi-marginal matching gap (M3G), a loss that borrows tools from multi-marginal optimal transport (MM-OT) theory to simultaneously incorporate all $k$ views. Given a batch of $n$ points, each seen as a $k$-tuple of views subsequently transformed into $k$ embeddings, our loss contrasts the cost of matching these $n$ ground-truth $k$-tuples with the MM-OT polymatching cost, which seeks $n$ optimally arranged $k$-tuples chosen within these $n\times k$ vectors. While the exponential complexity $O(n^k$) of the MM-OT problem may seem daunting, we show in experiments that a suitable generalization of the Sinkhorn algorithm for that problem can scale to, e.g., $k=3\sim 6$ views using mini-batches of size $64~\sim128$. Our experiments demonstrate improved performance over multiview extensions of pairwise losses, for both self-supervised and multimodal tasks.
BoxSnake: Polygonal Instance Segmentation with Box Supervision
Yang, Rui, Song, Lin, Ge, Yixiao, Li, Xiu
Box-supervised instance segmentation has gained much attention as it requires only simple box annotations instead of costly mask or polygon annotations. However, existing box-supervised instance segmentation models mainly focus on mask-based frameworks. We propose a new end-to-end training technique, termed BoxSnake, to achieve effective polygonal instance segmentation using only box annotations for the first time. Our method consists of two loss functions: (1) a point-based unary loss that constrains the bounding box of predicted polygons to achieve coarse-grained segmentation; and (2) a distance-aware pairwise loss that encourages the predicted polygons to fit the object boundaries. Compared with the mask-based weakly-supervised methods, BoxSnake further reduces the performance gap between the predicted segmentation and the bounding box, and shows significant superiority on the Cityscapes dataset. The code has been available publicly.
Adaptively Weighted Top-N Recommendation for Organ Matching
Shojaee, Parshin, Chen, Xiaoyu, Jin, Ran
Reducing the shortage of organ donations to meet the demands of patients on the waiting list has being a major challenge in organ transplantation. Because of the shortage, organ matching decision is the most critical decision to assign the limited viable organs to the most suitable patients. Currently, organ matching decisions were only made by matching scores calculated via scoring models, which are built by the first principles. However, these models may disagree with the actual post-transplantation matching performance (e.g., patient's post-transplant quality of life (QoL) or graft failure measurements). In this paper, we formulate the organ matching decision-making as a top-N recommendation problem and propose an Adaptively Weighted Top-N Recommendation (AWTR) method. AWTR improves performance of the current scoring models by using limited actual matching performance in historical data set as well as the collected covariates from organ donors and patients. AWTR sacrifices the overall recommendation accuracy by emphasizing the recommendation and ranking accuracy for top-N matched patients. The proposed method is validated in a simulation study, where KAS [60] is used to simulate the organ-patient recommendation response. The results show that our proposed method outperforms seven state-of-the-art top-N recommendation benchmark methods.
Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization
Wu, Guoqiang, Li, Chongxuan, Xu, Kun, Zhu, Jun
(Partial) ranking loss is a commonly used evaluation measure for multi-label classification, which is usually optimized with convex surrogates for computational efficiency. Prior theoretical work on multi-label ranking mainly focuses on (Fisher) consistency analyses. However, there is a gap between existing theory and practice -- some pairwise losses can lead to promising performance but lack consistency, while some univariate losses are consistent but usually have no clear superiority in practice. In this paper, we attempt to fill this gap through a systematic study from two complementary perspectives of consistency and generalization error bounds of learning algorithms. Our results show that learning algorithms with the consistent univariate loss have an error bound of $O(c)$ ($c$ is the number of labels), while algorithms with the inconsistent pairwise loss depend on $O(\sqrt{c})$ as shown in prior work. This explains that the latter can achieve better performance than the former in practice. Moreover, we present an inconsistent reweighted univariate loss-based learning algorithm that enjoys an error bound of $O(\sqrt{c})$ for promising performance as well as the computational efficiency of univariate losses. Finally, experimental results validate our theoretical analyses.
Attention-based Saliency Hashing for Ophthalmic Image Retrieval
Fang, Jiansheng, Xu, Yanwu, Zhang, Xiaoqing, Hu, Yan, Liu, Jiang
Deep hashing methods have been proved to be effective for the large-scale medical image search assisting reference-based diagnosis for clinicians. However, when the salient region plays a maximal discriminative role in ophthalmic image, existing deep hashing methods do not fully exploit the learning ability of the deep network to capture the features of salient regions pointedly. The different grades or classes of ophthalmic images may be share similar overall performance but have subtle differences that can be differentiated by mining salient regions. To address this issue, we propose a novel end-to-end network, named Attention-based Saliency Hashing (ASH), for learning compact hash-code to represent ophthalmic images. ASH embeds a spatial-attention module to focus more on the representation of salient regions and highlights their essential role in differentiating ophthalmic images. Benefiting from the spatial-attention module, the information of salient regions can be mapped into the hash-code for similarity calculation. In the training stage, we input the image pairs to share the weights of the network, and a pairwise loss is designed to maximize the discriminability of the hash-code. In the retrieval stage, ASH obtains the hash-code by inputting an image with an end-to-end manner, then the hash-code is used to similarity calculation to return the most similar images. Extensive experiments on two different modalities of ophthalmic image datasets demonstrate that the proposed ASH can further improve the retrieval performance compared to the state-of-the-art deep hashing methods due to the huge contributions of the spatial-attention module.
Metric learning: cross-entropy vs. pairwise losses
Boudiaf, Malik, Rony, Jรฉrรดme, Ziko, Imtiaz Masud, Granger, Eric, Pedersoli, Marco, Piantanida, Pablo, Ayed, Ismail Ben
Recently, substantial research efforts in Deep Metric Learning (DML) focused on designing complex pairwise-distance losses and convoluted sample-mining and implementation strategies to ease optimization. The standard cross-entropy loss for classification has been largely overlooked in DML. On the surface, the cross-entropy may seem unrelated and irrelevant to metric learning as it does not explicitly involve pairwise distances. However, we provide a theoretical analysis that links the cross-entropy to several well-known and recent pairwise losses. Our connections are drawn from two different perspectives: one based on an explicit optimization insight; the other on discriminative and generative views of the mutual information between the labels and the learned features. First, we explicitly demonstrate that the cross-entropy is an upper bound on a new pairwise loss, which has a structure similar to various pairwise losses: it minimizes intra-class distances while maximizing inter-class distances. As a result, minimizing the cross-entropy can be seen as an approximate bound-optimization (or Majorize-Minimize) algorithm for minimizing this pairwise loss. Second, we show that, more generally, minimizing the cross-entropy is actually equivalent to maximizing the mutual information, to which we connect several well-known pairwise losses. These findings indicate that the cross-entropy represents a proxy for maximizing the mutual information -- as pairwise losses do -- without the need for complex sample-mining and optimization schemes. Furthermore, we show that various standard pairwise losses can be explicitly related to one another via bound relationships. Our experiments over four standard DML benchmarks (CUB200, Cars-196, Stanford Online Product and In-Shop) strongly support our findings. We consistently obtained state-of-the-art results, outperforming many recent and complex DML methods.
Agent Prioritization for Autonomous Navigation
Refaat, Khaled S., Ding, Kai, Ponomareva, Natalia, Ross, Stรฉphane
In autonomous navigation, a planning system reasons about other agents to plan a safe and plausible trajectory. Before planning starts, agents are typically processed with computationally intensive models for recognition, tracking, motion estimation and prediction. With limited computational resources and a large number of agents to process in real time, it becomes important to efficiently rank agents according to their impact on the decision making process. This allows spending more time processing the most important agents. We propose a system to rank agents around an autonomous vehicle (AV) in real time. We automatically generate a ranking data set by running the planner in simulation on real-world logged data, where we can afford to run more accurate and expensive models on all the agents. The causes of various planner actions are logged and used for assigning ground truth importance scores. The generated data set can be used to learn ranking models. In particular, we show the utility of combining learned features, via a convolutional neural network, with engineered features designed to capture domain knowledge. We show the benefits of various design choices experimentally. When tested on real AVs, our system demonstrates the capability of understanding complex driving situations.
Adapting control policies from simulation to reality using a pairwise loss
Viereck, Ulrich, Saenko, Kate, Platt, Robert
This paper proposes an approach to domain transfer based on a pairwise loss function that helps transfer control policies learned in simulation onto a real robot. We explore the idea in the context of a "category level" manipulation task where a control policy is learned that enables a robot to perform a mating task involving novel objects. We explore the case where depth images are used as the main form of sensor input. Our experimental results demonstrate that proposed method consistently outperforms baseline methods that train only in simulation or that combine real and simulated data in a naive way. Recently, there has been a lot of interest in using deep neural networks to learn "pixels-to-torques" visuomotor controllers: robotic controllers that take sequential image data as input and produce low level motor commands as output.