Not enough data to create a plot.
Try a different view from the menu above.
33b879e7ab79f56af1e88359f9314a10-AuthorFeedback.pdf
We gratefully thank all reviewers for their valuable comments. We will try our best to address them in the revision. We agree that a global measurement makes our claims stronger. Comments on the discrepancy of the influential objects extracted from different explanations for R #2. We will include this measurement in our revision.
Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training Yanlai Yang 1, Matt Jones 2
We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs finetuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before encountering them again. This behavior occurs even though the documents are never presented in context together. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we demonstrate a new mechanism by which over-parametrized neural networks can recover from catastrophic interference and uncover new insights into training over-parameterized networks in cyclically structured environments.
Hierarchical classification at multiple operating points
Figure 4: Impact of loss hyper-parameters on trade-off with iNat21-Mini (correct vs. recall). Label smoothing and HXE achieve their best accuracy when set to zero, which is equivalent to a flat softmax. The soft-max-margin loss with C(y, ลท) = 1 Correct(y, ลท) performs best using scaling factor ฮฑ 5. Table 3 outlines the parametrisation that corresponds to each loss function. The loss functions that use a sigmoid do not guarantee a valid distribution on the class hierarchy (eq. Note that we use confidence threshold inference for all loss functions, regardless of the inference function that was used in the original publication.
A Appendix
A.1 UniBench Implementation Details We have developed UniBench to be easy-to-run library to allow researchers to systematically compare and contrast exsisting (n=59) and new VLMs on 53 benchmarks. To evaluate new VLMs that expand beyond the already implemented 59 VLMs, users need to follow Code Snippet 2. Users would need to create a class that inherent from ClipModel from uni_bench.models_zoo A.2 Natural Language Output Models on UniBench As described in Section 2.2, LLM-style models defined as models that generate tokens/text as output. Thereby, making them hard to compare with CLIP-style VLMs. In UniBench, we also incorporated LLM-style models in a control experiments.
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling
Significant research efforts have been made to scale and improve vision-language model (VLM) training approaches. Yet, with an ever-growing number of benchmarks, researchers are tasked with the heavy burden of implementing each protocol, bearing a non-trivial computational cost, and making sense of how all these benchmarks translate into meaningful axes of progress. To facilitate a systematic evaluation of VLM progress, we introduce UniBench: a unified implementation of 50+ VLM benchmarks spanning a range of carefully categorized vision-centric capabilities from object recognition to spatial awareness, counting, and much more. We showcase the utility of UniBench for measuring progress by evaluating nearly 60 publicly available vision-language models, trained on scales of up to 12.8B samples. We find that while scaling training data or model size can boost many vision-language model capabilities, scaling offers little benefit for reasoning or relations.
Certified Adversarial Robustness with Additive Noise
Bai Li, Changyou Chen, Wenlin Wang, Lawrence Carin
The existence of adversarial data examples has drawn significant attention in the deep-learning community; such data are seemingly minimally perturbed relative to the original data, but lead to very different outputs from a deep-learning algorithm. Although a significant body of work on developing defensive models has been considered, most such models are heuristic and are often vulnerable to adaptive attacks. Defensive methods that provide theoretical robustness guarantees have been studied intensively, yet most fail to obtain non-trivial robustness when a large-scale model and data are present. To address these limitations, we introduce a framework that is scalable and provides certified bounds on the norm of the input manipulation for constructing adversarial examples. We establish a connection between robustness against adversarial perturbation and additive random noise, and propose a training strategy that can significantly improve the certified bounds. Our evaluation on MNIST, CIFAR-10 and ImageNet suggests that the proposed method is scalable to complicated models and large data sets, while providing competitive robustness to state-of-the-art provable defense methods.
Algorithm 1 of PIC in a Algorithm 2 of supervised image like style
Algorithm 1 and 2 show the pseudocode of PIC and traditional supervised classification in a PyTorchlike style, respectively, which show that PIC can be easily adapted from supervised classification by only modifying a few lines of code. When we adopt the recent sampling strategy, those instance examples not included in the recent iterations will have zero gradient during training. Pre-training We follow the similar augmentation as Chen et al. [5] to adopt random resize and crop, random flip, strong color distortions, and Gaussian blur as the data augmentations, where the only difference is that we adopt the crop scale as 0.2 as Chen et al. [6]. We use Stochastic Gradient Descent (SGD) as our optimizer, with weight decay of 0.0001 and momentum as 0.9. We adopt a batch size of 512 in 8 GPUs with batch size per GPU as 64.
Parametric Instance Classification for Unsupervised Visual Feature Learning
This paper presents parametric instance classification (PIC) for unsupervised visual feature learning. Unlike the state-of-the-art approaches which do instance discrimination in a dual-branch non-parametric fashion, PIC directly performs a one-branch parametric instance classification, revealing a simple framework similar to supervised classification and without the need to address the information leakage issue. We show that the simple PIC framework can be as effective as the stateof-the-art approaches, i.e. SimCLR and MoCo v2, by adapting several common component settings used in the state-of-the-art approaches. We also propose two novel techniques to further improve effectiveness and practicality of PIC: 1) a sliding-window data scheduler, instead of the previous epoch-based data scheduler, which addresses the extremely infrequent instance visiting issue in PIC and improves the effectiveness; 2) a negative sampling and weight update correction approach to reduce the training time and GPU memory consumption, which also enables application of PIC to almost unlimited training images. We hope that the PIC framework can serve as a simple baseline to facilitate future study. The code and network configurations are available at https://github.com/bl0/PIC.
PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation
Can Qin, Haoxuan You, Lichen Wang, C.-C. Jay Kuo, Yun Fu
Domain Adaptation (DA) approaches achieved significant improvements in a wide range of machine learning and computer vision tasks (i.e., classification, detection, and segmentation). However, as far as we are aware, there are few methods yet to achieve domain adaptation directly on 3D point cloud data. The unique challenge of point cloud data lies in its abundant spatial geometric information, and the semantics of the whole object is contributed by including regional geometric structures. Specifically, most general-purpose DA methods that struggle for global feature alignment and ignore local geometric information are not suitable for 3D domain alignment. In this paper, we propose a novel 3D Domain Adaptation Network for point cloud data (PointDAN).