Goto

Collaborating Authors

 Asia


Appendix - An Image is Worth More Than a Thousand Words: Towards Disentanglement in The Wild Table of Contents

Neural Information Processing Systems

We use the images at 256 256resolution. We follow [21] and use all the images for training. The images used for the qualitative visualizations contain random images from the web and samples from CelebA-HQ. AFHQ [8] 15,000high quality images categorized into three domains: cat, dog and wildlife. We use the images at 128 128 resolution, holding out 500 images from each domain for testing.


When Expressivity Meets Trainability: Fewer than n Neurons Can Work

Neural Information Processing Systems

Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a benign optimization landscape?



Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification

Neural Information Processing Systems

We consider the fixed-budget best arm identification problem where the goal is to find the arm of the largest mean with a fixed number of samples. It is known that the probability of misidentifying the best arm is exponentially small to the number of rounds. However, limited characterizations have been discussed on the rate (exponent) of this value. In this paper, we characterize the minimax optimal rate as a result of an optimization over all possible parameters. We introduce two rates, Rgo and Rgo, corresponding to lower bounds on the probability of misidentification, each of which is associated with a proposed algorithm. The rate Rgo is associated with Rgo-tracking, which can be efficiently implemented by a neural network and is shown to outperform existing algorithms. However, this rate requires a nontrivial condition to be achievable. To address this issue, we introduce the second rate Rgo . We show that this rate is indeed achievable by introducing a conceptual algorithm called delayed optimal tracking (DOT).



Revisit the Power of Vanilla Knowledge Distillation: from Small Scale to Large Scale

Neural Information Processing Systems

The tremendous success of large models trained on extensive datasets demonstrates that scale is a key ingredient in achieving superior results. Therefore, the reflection on the rationality of designing knowledge distillation (KD) approaches for limited-capacity architectures solely based on small-scale datasets is now deemed imperative. In this paper, we identify the small data pitfall that presents in previous KD methods, which results in the underestimation of the power of vanilla KD framework on large-scale datasets such as ImageNet-1K. Specifically, we show that employing stronger data augmentation techniques and using larger datasets can directly decrease the gap between vanilla KD and other meticulously designed KD variants. This highlights the necessity of designing and evaluating KD approaches in the context of practical scenarios, casting off the limitations of small-scale datasets. Our investigation of the vanilla KD and its variants in more complex schemes, including stronger training strategies and different model capacities, demonstrates that vanilla KD is elegantly simple but astonishingly effective in large-scale scenarios. Without bells and whistles, we obtain state-of-the-art ResNet50, ViT-S, and ConvNeXtV2-T models for ImageNet, which achieve 83.1%, 84.3%, and 85.0% top-1 accuracy, respectively.


Interpreting Representation Quality of DNNs for 3DPoint Cloud Processing: Supplementary Materials Wen Shenb Qihan Rena Dongrui Liua Quanshi Zhanga aShanghai Jiao Tong UniversitybTongji University

Neural Information Processing Systems

This section provides more details about Shapley values in Section 3 of the paper. Linearity: If two independent games vand wcan be merged into one game u(S) = v(S)+w(S), then the Shapley value of the player i in game v and game w also can be merged, i.e. ฯ†u(i) = ฯ†v(i)+ฯ†w(i). Nullity: A dummy player isatisfies S N\{i},v(S {i}) = v(S)+v({i}), which indicates that the player ihas no interaction with other players, i.e. ฯ†(i) = v({i}). Efficiency: The overall reward can be allocated to all players in the game, i.e. This section provides more details about multi-order interactions [8] in Section 3.3 of the paper.


Interpreting Representation Quality of DNNs for 3DPoint Cloud Processing

Neural Information Processing Systems

In this paper, we evaluate the quality of knowledge representations encoded in deep neural networks (DNNs) for 3D point cloud processing. We propose a method to disentangle the overall model vulnerability into the sensitivity to the rotation, the translation, the scale, and local 3D structures. Besides, we also propose metrics to evaluate the spatial smoothness of encoding 3D structures, and the representation complexity of the DNN. Based on such analysis, experiments expose representation problems with classic DNNs, and explain the utility of the adversarial training. The code will be released when this paper is accepted.



SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization

Neural Information Processing Systems

Multilayer-perceptrons (MLP) are known to struggle with learning functions of high-frequencies, and in particular cases with wide frequency bands. We present a spatially adaptive progressive encoding (SAPE) scheme for input signals of MLP networks, which enables them to better fit a wide range of frequencies without sacrificing training stability or requiring any domain specific preprocessing. SAPE gradually unmasks signal components with increasing frequencies as a function of time and space. The progressive exposure of frequencies is monitored by a feedback loop throughout the neural optimization process, allowing changes to propagate at different rates among local spatial portions of the signal space. We demonstrate the advantage of SAPE on a variety of domains and applications, including regression of low dimensional signals and images, representation learning of occupancy networks, and a geometric task of mesh transfer between 3D shapes.