Goto

Collaborating Authors

 Asia



Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again

Neural Information Processing Systems

Knowledge Distillation (KD) aims at transferring the knowledge of a wellperformed neural network (the teacher) to a weaker one (the student). A peculiar phenomenon is that a more accurate model doesn't necessarily teach better, and temperature adjustment can neither alleviate the mismatched capacity. To explain this, we decompose the efficacy of KD into three parts: correct guidance, smooth regularization, and class discriminability. The last term describes the distinctness of wrong class probabilities that the teacher provides in KD. Complex teachers tend to be over-confident and traditional temperature scaling limits the efficacy of class discriminability, resulting in less discriminative wrong class probabilities. Therefore, we propose Asymmetric Temperature Scaling (ATS), which separately applies a higher/lower temperature to the correct/wrong class. ATS enlarges the variance of wrong class probabilities in the teacher's label and makes the students grasp the absolute affinities of wrong classes to the target class as discriminative as possible. Both theoretical analysis and extensive experimental results demonstrate the effectiveness of ATS.


Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again

Neural Information Processing Systems

Knowledge Distillation (KD) aims at transferring the knowledge of a wellperformed neural network (the teacher) to a weaker one (the student). A peculiar phenomenon is that a more accurate model doesn't necessarily teach better, and temperature adjustment can neither alleviate the mismatched capacity. To explain this, we decompose the efficacy of KD into three parts: correct guidance, smooth regularization, and class discriminability. The last term describes the distinctness of wrong class probabilities that the teacher provides in KD. Complex teachers tend to be over-confident and traditional temperature scaling limits the efficacy of class discriminability, resulting in less discriminative wrong class probabilities. Therefore, we propose Asymmetric Temperature Scaling (ATS), which separately applies a higher/lower temperature to the correct/wrong class. ATS enlarges the variance of wrong class probabilities in the teacher's label and makes the students grasp the absolute affinities of wrong classes to the target class as discriminative as possible. Both theoretical analysis and extensive experimental results demonstrate the effectiveness of ATS.


Who's in control of AI?

Al Jazeera

Owner of US tech giant reveals breach of one of world's most powerful AI models. Reports of unauthorised access to one of the most powerful Artificial Intelligence models yet developed have emerged. Nothing malicious, say the owners - but it has intensified focus on such technology falling into the wrong hands. So, how is AI being controlled globally? Will complex EU loan deal intensify conflict?


DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

Neural Information Processing Systems

We present DreamWaltz, a novel framework for generating and animating complex 3D avatars given text guidance and parametric human body prior. While recent methods objects, creating have sho high-quality wn encouraging and animatable results for 3D text-to-3D avatars remains generation challenging. of common To create high-quality 3D avatars, DreamWaltz proposes 3D-consistent occlusionaware Score Distillation Sampling (SDS) to optimize implicit neural representations with canonical poses. It provides view-aligned supervision via 3D-aware skeleton conditioning which enables complex avatar generation without artifacts and multiple faces. For animation, our method learns an animatable 3D avatar representation from abundant image priors of diffusion model conditioned on various poses, which could animate complex non-rigged avatars given arbitrary poses without retraining. Extensive evaluations demonstrate that DreamWaltz is an effective and robust approach for creating 3D avatars that can take on complex shapes and appearances as well as novel poses for animation. The proposed framework further enables the creation of complex scenes with diverse compositions, including avatar-avatar, avatar-object and avatar-scene interactions.


" Lossless " Compression of Deep Neural Networks: AHigh-dimensional Neural Tangent Kernel Approach

Neural Information Processing Systems

Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, so that they can be deployed on low-power IoT devices. In this paper, building upon recent advances in neural tangent kernel (NTK) and random matrix theory (RMT), we provide a novel compression approach to wide and fully-connected deep neural nets. Specifically, we demonstrate that in the high-dimensional regime where the number of data points n and their dimension p are both large, and under a Gaussian mixture model for the data, there exists asymptotic spectral equivalence between the NTK matrices for a large family of DNN models. This theoretical result enables "lossless" compression of a given DNN to be performed, in the sense that the compressed network yields asymptotically the same NTK as the original (dense and unquantized) network, with its weights and activations taking values only in {0, 1} up to a scaling.





Appendix

Neural Information Processing Systems

We provide an overview of the Appendix below. We elaborate on the additional details to transfer our MQ-Det to downstream tasks, including finetuning-free, few-shot, and full-shot settings. The introduction is organized as followed. Different ways to acquire the vision queries in Sec. This is an elaborative description of Sec.