Goto

Collaborating Authors

 benchmarking generative model


Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming

Neural Information Processing Systems

Generative models have demonstrated human-level proficiency in various benchmarks across domains like programming, natural sciences, and general knowledge. Despite these promising results on competitive benchmarks, they still struggle with seemingly simple problem-solving tasks typically carried out by elementary-level students. How do state-of-the-art models perform on standardized programming-related tests designed to assess computational thinking and problem-solving skills at schools? In this paper, we curate a novel benchmark involving computational thinking tests grounded in elementary visual programming domains. Our initial results show that state-of-the-art models like GPT-4o and Llama3 barely match the performance of an average school student.


The ArtBench Dataset: Benchmarking Generative Models with Artworks - Technology Org

#artificialintelligence

Deep generative models can synthesize diverse and high-fidelity images. Computational understanding of art attracts more and more attention because of its importance for art history, computational creativity and human-computer interaction. The new research proposes the idea to use art for the purposes of benchmarking generative AI models. The dataset is composed of 60,000 images annotated with 10 artistic styles such as Baroque or Surrealism. The images are of high-quality with clean and balanced labels and can be easily incorporated in commonly used deep learning frameworks.