A Appendix A.1 UniBench Implementation Details We have developed UniBench
–Neural Information Processing Systems
To evaluate new VLMs that expand beyond the already implemented 59 VLMs, users need to follow Code Snippet 2. Users would need to create a class that inherent from As described in Section 2.2, LLM-style models defined as models that generate tokens/text as output. Thereby, making them hard to compare with CLIP-style VLMs. Following Matsuura et al. [2023] methodology, we evaluated Llava 1.5 [Liu et al., 2023] - a LLM-style VLM - on various benchmark types in UniBench (Table 2). Scaling improves many benchmarks, but offers little benefit for reasoning and relation. Figure 8: Benchmark capabilities performance does not scale with dataset and model size Median zero-shot performance of models on various benchmark capabilities.
Neural Information Processing Systems
Oct-10-2025, 10:21:39 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Large Language Model (0.87)
- Vision (1.00)
- Information Technology > Artificial Intelligence