Not enough data to create a plot.
Try a different view from the menu above.
Wang, Dequan
On-target Adaptation
Wang, Dequan, Liu, Shaoteng, Ebrahimi, Sayna, Shelhamer, Evan, Darrell, Trevor
Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain. Most adaptation methods rely on the source data by joint optimization over source data and target data. Source-free methods replace the source data with a source model by fine-tuning it on target. Either way, the majority of the parameter updates for the model representation and the classifier are derived from the source, and not the target. However, target accuracy is the goal, and so we argue for optimizing as much as possible on the target data. We show significant improvement by on-target adaptation, which learns the representation purely from target data while taking only the source predictions for supervision. In the long-tailed classification setting, we show further improvement by on-target class distribution learning, which learns the (im)balance of classes from target data.
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Chen, Jianfei, Zheng, Lianmin, Yao, Zhewei, Wang, Dequan, Stoica, Ion, Mahoney, Michael W., Gonzalez, Joseph E.
The increasing size of neural network models has been critical for improvements in their accuracy, but device memory is not growing at the same rate. This creates fundamental challenges for training neural networks within limited memory environments. In this work, we propose ActNN, a memory-efficient training framework that stores randomly quantized activations for back propagation. We prove the convergence of ActNN for general network architectures, and we characterize the impact of quantization on the convergence via an exact expression for the gradient variance. Using our theory, we propose novel mixed-precision quantization strategies that exploit the activation's heterogeneity across feature dimensions, samples, and layers. These techniques can be readily applied to existing dynamic graph frameworks, such as PyTorch, simply by substituting the layers. We evaluate ActNN on mainstream computer vision models for classification, detection, and segmentation tasks. On all these tasks, ActNN compresses the activation to 2 bits on average, with negligible accuracy loss. ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
Fully Test-time Adaptation by Entropy Minimization
Wang, Dequan, Shelhamer, Evan, Liu, Shaoteng, Olshausen, Bruno, Darrell, Trevor
A model must adapt itself to generalize to new and different data during testing. This is the setting of fully test-time adaptation given only unlabeled test data and the model parameters. We propose test-time entropy minimization (tent) for adaptation: we optimize for model confidence as measured by the entropy of its predictions. During testing, we adapt the model features by estimating normalization statistics and optimizing channel-wise affine transformations. Tent improves robustness to corruptions for image classification on ImageNet and CIFAR-10/100 and achieves state-of-the-art error on ImageNet-C for ResNet-50. Tent demonstrates the feasibility of target-only domain adaptation for digit classification from SVHN to MNIST/MNIST-M/USPS and semantic segmentation from GTA to Cityscapes. Deep networks can achieve high accuracy on training and testing data from the same distribution, as evidenced by tremendous benchmark progress (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; He et al., 2016). However, generalization to new and different data is limited (Hendrycks & Dietterich, 2019; Recht et al., 2019; Geirhos et al., 2018). Accuracy suffers when the training (source) data differ from the testing (target) data, a condition known as dataset shift (Quionero-Candela et al., 2009). Models can be sensitive to shifts during testing that were not known during training, whether natural variations or corruptions, such as unexpected weather or sensor degradation.
Convolutional Neural Networks on non-uniform geometrical signals using Euclidean spectral transformation
Jiang, Chiyu "Max", Wang, Dequan, Huang, Jingwei, Marcus, Philip, Nieรner, Matthias
Convolutional Neural Networks (CNN) have been successful in processing data signals that are uniformly sampled in the spatial domain (e.g., images). However, most data signals do not natively exist on a grid, and in the process of being sampled onto a uniform physical grid suffer significant aliasing error and information loss. Moreover, signals can exist in different topological structures as, for example, points, lines, surfaces and volumes. It has been challenging to analyze signals with mixed topologies (for example, point cloud with surface mesh). To this end, we develop mathematical formulations for Non-Uniform Fourier Transforms (NUFT) to directly, and optimally, sample nonuniform data signals of different topologies defined on a simplex mesh into the spectral domain with no spatial sampling error. The spectral transform is performed in the Euclidean space, which removes the translation ambiguity from works on the graph spectrum. Our representation has four distinct advantages: (1) the process causes no spatial sampling error during the initial sampling, (2) the generality of this approach provides a unified framework for using CNNs to analyze signals of mixed topologies, (3) it allows us to leverage state-of-the-art backbone CNN architectures for effective learning without having to design a particular architecture for a particular data structure in an ad-hoc fashion, and (4) the representation allows weighted meshes where each element has a different weight (i.e., texture) indicating local properties. We achieve results on par with the state-of-the-art for the 3D shape retrieval task, and a new state-of-the-art for the point cloud to surface reconstruction task.
Deep Object Centric Policies for Autonomous Driving
Wang, Dequan, Devin, Coline, Cai, Qi-Zhi, Yu, Fisher, Darrell, Trevor
Abstract-- While learning visuomotor skills in an end-toend manner is appealing, deep neural networks are often uninterpretable and fail in surprising ways. For robotics tasks, such as autonomous driving, models that explicitly represent objects may be more robust to new scenes and provide intuitive visualizations. We describe a taxonomy of "object-centric" models which leverage both object instances and end-to-end learning. In the Grand Theft Auto V simulator, we show that object centric models outperform object-agnostic methods in scenes with other vehicles and pedestrians, even with an imperfect detector. We also demonstrate that our architectures perform well on real world environments by evaluating on the Berkeley DeepDrive Video dataset. I. INTRODUCTION End-to-end approaches to visuomotor learning are appealing in their ability to discover which features of an observed environment are most relevant for a task, and to be able to exploit large amounts of training data to discover both a policy and a codependent visual representation. Yet, the key benefit of such approaches--that they learn from task experience--is also their Achilles heel when it comes to many real-world settings, where behavioral training data is not unlimited and correct perception of "long-tail" visual phenomena can be critical for robust performance.