A Appendix

Neural Information Processing Systems 

A.1 UniBench Implementation Details We have developed UniBench to be easy-to-run library to allow researchers to systematically compare and contrast exsisting (n=59) and new VLMs on 53 benchmarks. To evaluate new VLMs that expand beyond the already implemented 59 VLMs, users need to follow Code Snippet 2. Users would need to create a class that inherent from ClipModel from uni_bench.models_zoo A.2 Natural Language Output Models on UniBench As described in Section 2.2, LLM-style models defined as models that generate tokens/text as output. Thereby, making them hard to compare with CLIP-style VLMs. In UniBench, we also incorporated LLM-style models in a control experiments.