AITopics | answer index

Collaborating Authors

answer index

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks

Sampat, Shailaja Keyur, Nakamura, Mutsumi, Kailas, Shankar, Aggarwal, Kartik, Zhou, Mandy, Yang, Yezhou, Baral, Chitta

arXiv.org Artificial IntelligenceOct-17-2024

Deriving inference from heterogeneous inputs (such as images, text, and audio) is an important skill for humans to perform day-to-day tasks. A similar ability is desirable for the development of advanced Artificial Intelligence (AI) systems. While state-of-the-art models are rapidly closing the gap with human-level performance on diverse computer vision and NLP tasks separately, they struggle to solve tasks that require joint reasoning over visual and textual modalities. Inspired by GLUE (Wang et. al., 2018)- a multitask benchmark for natural language understanding, we propose VL-GLUE in this paper. VL-GLUE consists of over 100k samples spanned across seven different tasks, which at their core require visuo-linguistic reasoning. Moreover, our benchmark comprises of diverse image types (from synthetically rendered figures, and day-to-day scenes to charts and complex diagrams) and includes a broad variety of domain-specific text (from cooking, politics, and sports to high-school curricula), demonstrating the need for multi-modal understanding in the real-world. We show that this benchmark is quite challenging for existing large-scale vision-language models and encourage development of systems that possess robust visuo-linguistic reasoning capabilities.

benchmark, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2410.13666

Country:

North America > Puerto Rico (0.04)
Asia > Pakistan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(10 more...)

Genre:

Research Report (0.70)
Overview (0.68)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Education > Educational Setting > K-12 Education (0.87)
Leisure & Entertainment > Sports > Golf (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Measuring Vision-Language STEM Skills of Neural Models

Shen, Jianhao, Yuan, Ye, Mirzoyan, Srbuhi, Zhang, Ming, Wang, Chenguang

arXiv.org Artificial IntelligenceMay-22-2024

We introduce a new challenge to test the STEM skills of neural models. The problems in the real world often require solutions, combining knowledge from STEM (science, technology, engineering, and math). Unlike existing datasets, our dataset requires the understanding of multimodal vision-language information of STEM. Our dataset features one of the largest and most comprehensive datasets for the challenge. It includes 448 skills and 1,073,146 questions spanning all STEM subjects. Compared to existing datasets that often focus on examining expert-level ability, our dataset includes fundamental skills and questions designed based on the K-12 curriculum. We also add state-of-the-art foundation models such as CLIP and GPT-3.5-Turbo to our benchmark. Results show that the recent model advances only help master a very limited number of lower grade-level skills (2.5% in the third grade) in our dataset. In fact, these models are still well below (averaging 54.7%) the performance of elementary students, not to mention near expert-level performance. To understand and increase the performance on our dataset, we teach the models on a training split of our dataset. Even though we observe improved performance, the model performance remains relatively low compared to average elementary students. To solve STEM problems, we will need novel algorithmic innovations from the community.

answer index, conference paper, dataset, (16 more...)

arXiv.org Artificial Intelligence

2402.17205

Country:

North America > United States > California (0.04)
Asia > China (0.04)
South America > Ecuador (0.04)
(7 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Energy (1.00)
Materials (0.92)
Leisure & Entertainment > Games > Computer Games (0.92)
Education > Educational Setting > K-12 Education > Primary School (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback