Goto

Collaborating Authors

 foreground


Supplementary Materials for the Paper " L2T-DLN: Learning to Teach with Dynamic Loss Network "

Neural Information Processing Systems

In this supplementary material, we provide the proofs of convergence analysis in Section 1, 1-vs-1 transformation employed in the classification and semantic segmentation tasks in Section 2, the coordinate-wise and the preprocessing method of the LSTM teacher in Section 3, the loss functions of YOLO-v3 in Section 4, more experiments of image classification in Section 5, and the inferences of semantic segmentation in Section 6. A differentiable function e()is L-smooth with gradient Lipschitz constant C (uniformly Lipschitz continuous), if e(x) e(y) C x y, x,y. The function is called block-wise smooth with gradient Lipschitz Ci, if i e(x i,xi) ie(x i,x i) Ci xi x i, x,x (1) or with gradient Lipschitz constants { Ci}, if i e(x i,xi) ie(x i,xi) Ci x i x i, x,x (2) Further, Let Gmax max{Ci, Ci, k} C. Definition 2. For a differentiable function e(), if e(x) = 0, then x is a first-order stationary solution (SS1). For a differentiable function e(), if x is a SS1, and there exists ϵ > 0 so that for any y in the ϵ-neighborhood of x, we have e(x) e(y), then xis a local minimum. A saddle point xis an SS1 that is not a local minimum. If λmin( 2e(x)) < 0, x is a strict (non-degenerate) saddle point.


Supplementary

Neural Information Processing Systems

Contents1 1 PrinCut 22 1.1 How to use PrinCut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Do not distribute. 1 PrinCut22 1.1 How to use PrinCut23 The PrinCut GUI is shown in Figure 1. PrinCut is a MATLAB app, and its package is also provided24 in the supplementary. The left shows raw data without annotation. The right shows both raw data and annotation overlay.


Fetch and Forge: Efficient Dataset Condensation for Object Detection

Neural Information Processing Systems

Dataset condensation (DC) is an emerging technique capable of creating compact synthetic datasets from large originals while maintaining considerable performance. It is crucial for accelerating network training and reducing data storage requirements. However, current research on DC mainly focuses on image classification, with less exploration of object detection.This is primarily due to two challenges: (i) the multitasking nature of object detection complicates the condensation process, and (ii) Object detection datasets are characterized by large-scale and high-resolution data, which are difficult for existing DC methods to handle.As a remedy, we propose DCOD, the first dataset condensation framework for object detection. It operates in two stages: Fetch and Forge, initially storing key localization and classification information into model parameters, and then reconstructing synthetic images via model inversion. For the complex of multiple objects in an image, we propose Foreground Background Decoupling to centrally update the foreground of multiple instances and Incremental PatchExpand to further enhance the diversity of foregrounds.Extensive experiments on various detection datasets demonstrate the superiority of DCOD. Even at an extremely low compression rate of 1\%, we achieve 46.4\% and 24.7\% $\text{AP}_{50}$ on the VOC and COCO, respectively, significantly reducing detector training duration.


Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model

Neural Information Processing Systems

Existing multi-modal image fusion methods fail to address the compound degradations presented in source images, resulting in fusion images plagued by noise, color bias, improper exposure, etc. Additionally, these methods often overlook the specificity of foreground objects, weakening the salience of the objects of interest within the fused images. To address these challenges, this study proposes a novel interactive multi-modal image fusion framework based on the text-modulated diffusion model, called Text-DiFuse.