Transformers and ConvNets Using Counterfactual Simulation Testing

Neural Information Processing Systems 

We observe an even stronger tendency for Swin to conserve initial predictions under partial occlusion. We show our experiment in Figure 2. We find very similar conclusions, ConvNext to object features in the canonical pose. Here we present more details about the proposed NVD dataset. Next, in Figure 1, we present a non-exhaustive showcase of the 92 object models contained in NVD. Unfortunately, Swin V2 architectures are exclusively available for inference on images of size at least 256x256.