AITopics | gemini 1

VHELM: A Holistic Evaluation of Vision Language Models

Neural Information Processing SystemsMar-22-2026, 22:52:54 GMT

Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity. Furthermore, they differ in their evaluation procedures and the scope of the evaluation, making it difficult to compare models. To address these issues, we extend the HELM framework to VLMs to present the Holistic Evaluation of Vision Language Models (VHELM). VHELM aggregates various datasets to cover one or more of the 9 aspects:,,,,,,,, and . In doing so, we produce a comprehensive, multi-dimensional view of the capabilities of the VLMs across these important factors.

artificial intelligence, natural language, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.90)

Add feedback

fe2fc7dc60b55ccd8886220b40fb1f74-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 20:12:34 GMT

gemini 1, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
South America > Peru > Cusco Department > Cusco Province > Cusco (0.04)
Asia > Japan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Law (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition Mohammadreza Salehi Jae Sung Park

Neural Information Processing SystemsFeb-18-2026, 18:24:59 GMT

Each video in the dataset is paired with a question and four or five choices.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Government (1.00)
Law (0.92)
Leisure & Entertainment > Sports > Soccer (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

d74033a247989e8f6f3bf9e0c9629fb5-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 07:44:29 GMT

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

d0718553fd6b227a353c6432cf893285-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 06:00:58 GMT

large language model, machine learning, programming language, (23 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Oceania > Australia (0.04)
North America > Montserrat (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Government (0.67)
Law > Intellectual Property & Technology Law (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

b3c318cd7ee132d8a6b1895a2d6436c7-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-17-2026, 14:31:59 GMT

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country: Europe > Norway (0.04)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Appendix

Neural Information Processing SystemsFeb-16-2026, 19:40:05 GMT

We provide more information on AIPS' deductive engine and the training process for the value network. To highlight the reasoning ability and maintain readability of proofs, we avoid using brute-force methods such as augmentation-substitution and Wu's method Wu [1978].

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Japan (0.04)
Europe > Poland (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

Many-Shot In-Context Learning Rishabh Agarwal, A vi Singh

Neural Information Processing SystemsFeb-16-2026, 12:59:01 GMT

LLMs excel at few-shot in-context learning (ICL) - learning from a few input-output examples ("shots") provided in context at inference, without any weight updates.

icl, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: