Plotting

Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications -- Supplementary Material Daniel Lee, Aaron Roth

Neural Information Processing Systems

Papers by Azar et al. [2014] and Kesselheim and Singla [2020] study a related problem: an online setting with vector-valued losses, where the goal is to minimize the l On the one hand, this benchmark is stronger than ours in the sense that the maximum over coordinates is taken outside the sum over time, whereas our benchmark considers a "greedy" per-round maximum. On the other hand, in our setting the game can be different at every round, so our benchmark allows a comparison to a different action at each round rather than a single fixed action. In the setting of Kesselheim and Singla [2020], it is impossible to give any regret bound to their benchmark, so they derive an algorithm obtaining a log(d) competitive ratio to this benchmark. In contrast, our benchmark admits a regret bound. Hence, our results are quite different in kind despite the outward similarity of the settings: none of our applications follow from their theorems (since in all of our applications, we derive regret bounds). A different line of work [Rakhlin et al., 2010, 2011] takes a very general minimax approach towards deriving bounds in online learning, including regret minimization, calibration, and approachability.



NVRC: Neural Video Representation Compression

Neural Information Processing Systems

Recent advances in implicit neural representation (INR)-based video coding have demonstrated its potential to compete with both conventional and other learningbased approaches. With INR methods, a neural network is trained to overfit a video sequence, with its parameters compressed to obtain a compact representation of the video content. However, although promising results have been achieved, the best INR-based methods are still out-performed by the latest standard codecs, such as VVC VTM, partially due to the simple model compression techniques employed.


Supplementary Material for Temporal Dynamic Quantization for Diffusion Models

Neural Information Processing Systems

In this Supplementary material, we present the results of the experiments mentioned in the paper, along with additional experiments. The following items are provided: A comparison between Dynamic Quantization and TDQ in Section 2. Ablation study on time step encoding in Section 3. Detailed TDQ Module architecture in Section 4. Comparison with multiple quantization interval directly on PTQ in Section 5. Integration of TDQ with various QAT schemes in Section 6. Various experiments about robustness of the TDQ Module in Section 7. Detailed experimental results on the Output dynamics of the TDQ module in Section 8. Detailed experimental results on the Evolution of Activation Distribution in Section 9. Various non-cherry-picked results of generated images in Section 10.


Temporal Dynamic Quantization for Diffusion Models Junhyuk So1 Jungwon Lee 2 Hyungjun Kim 3

Neural Information Processing Systems

The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its use on mobile devices. Existing quantization techniques struggle to maintain performance even in 8-bit precision due to the diffusion model's unique property of temporal variation in activation. We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information, significantly improving output quality. Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference and is compatible with both post-training quantization (PTQ) and quantization-aware training (QAT). Our extensive experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets.


A Appendix for Details of Deriving HTGM A.1 The lower-bound of the likelihood function posterior q (v

Neural Information Processing Systems

In this section, we provide the details of the lower-bound in Eq. (3). This completes the derivation of Eq. (3). In other words, there is no overlap between any pair of balls. Therefore, we have the following derivation from Eq. (9). The training algorithm of HTGM is summarized in Algorithm 1.


Hierarchical Gaussian Mixture based Task Generative Model for Robust Meta-Learning

Neural Information Processing Systems

Meta-learning enables quick adaptation of machine learning models to new tasks with limited data. While tasks could come from varying distributions in reality, most of the existing meta-learning methods consider both training and testing tasks as from the same uni-component distribution, overlooking two critical needs of a practical solution: (1) the various sources of tasks may compose a multi-component mixture distribution, and (2) novel tasks may come from a distribution that is unseen during meta-training. In this paper, we demonstrate these two challenges can be solved jointly by modeling the density of task instances. We develop a metatraining framework underlain by a novel Hierarchical Gaussian Mixture based Task Generative Model (HTGM). HTGM extends the widely used empirical process of sampling tasks to a theoretical model, which learns task embeddings, fits the mixture distribution of tasks, and enables density-based scoring of novel tasks. The framework is agnostic to the encoder and scales well with large backbone networks. The model parameters are learned end-to-end by maximum likelihood estimation via an Expectation-Maximization (EM) algorithm. Extensive experiments on benchmark datasets indicate the effectiveness of our method for both sample classification and novel task detection.


FUSU: A Multi-temporal-source Land Use Change Segmentation Dataset for Fine-grained Urban Semantic Understanding

Neural Information Processing Systems

Fine urban change segmentation using multi-temporal remote sensing images is essential for understanding human-environment interactions in urban areas. Although there have been advances in high-quality land cover datasets that reveal the physical features of urban landscapes, the lack of fine-grained land use datasets hinders a deeper understanding of how human activities are distributed across the landscape and the impact of these activities on the environment, thus constraining proper technique development. To address this, we introduce FUSU, the first finegrained land use change segmentation dataset for Fine-grained Urban Semantic Understanding. FUSU features the most detailed land use classification system to date, with 17 classes and 30 billion pixels of annotations. It includes bi-temporal high-resolution satellite images with 0.2-0.5 m ground sample distance and monthly optical and radar satellite time series, covering 847 km