computational
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming
Generative models have demonstrated human-level proficiency in various benchmarks across domains like programming, natural sciences, and general knowledge. Despite these promising results on competitive benchmarks, they still struggle with seemingly simple problem-solving tasks typically carried out by elementary-level students. How do state-of-the-art models perform on standardized programming-related tests designed to assess computational thinking and problem-solving skills at schools? In this paper, we curate a novel benchmark involving computational thinking tests grounded in elementary visual programming domains. Our initial results show that state-of-the-art models like GPT-4o and Llama3 barely match the performance of an average school student. To further boost the performance of these models, we fine-tune them using a novel synthetic data generation methodology. The key idea is to develop a comprehensive dataset using symbolic methods that capture different skill levels, ranging from recognition of visual elements to multi-choice quizzes to synthesis-style tasks. We showcase how various aspects of symbolic information in synthetic data help improve fine-tuned models' performance. We will release the full implementation and datasets to facilitate further research on enhancing computational thinking in generative models.
Liberating Logic in the Age of AI: Going Beyond Programming with Computational Thinking
Schmidt, Douglas C., Runfola, Dan
Mastering one or more programming languages has historically been the gateway to implementing ideas on a computer. Today, that gateway is widening with advances in large language models (LLMs) and artificial intelligence (AI)-powered coding assistants. What matters is no longer just fluency in traditional programming languages but the ability to think computationally by translating problems into forms that can be solved with computing tools. The capabilities enabled by these AI-augmented tools are rapidly leading to the commoditization of computational thinking, such that anyone who can articulate a problem in natural language can potentially harness computing power via AI. This shift is poised to radically influence how we teach computer science and data science in the United States and around the world. Educators and industry leaders are grappling with how to adapt: What should students learn when the hottest new programming language is English? How do we prepare a generation of computational thinkers who need not code every algorithm manually, but must still think critically, design solutions, and verify AI-augmented results? This paper explores these questions, examining the impact of natural language programming on software development, the emerging distinction between programmers and prompt-crafting problem solvers, the reforms needed in computer science and data science curricula, and the importance of maintaining our fundamental computational science principles in an AI-augmented future. Along the way, we compare approaches and share best practices for embracing this new paradigm in computing education.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Washington > King County > Redmond (0.04)
- (5 more...)
- Education > Curriculum > Subject-Specific Education (1.00)
- Education > Educational Setting > Higher Education (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
We will fix all minor comments and typos without explicitly addressing them in the rebuttal
We will fix all minor comments and typos without explicitly addressing them in the rebuttal. Practical Impact: Our primary aim in this work is indeed theoretical. Appendix, but we will add a reference to it in the main paper. P AC T erminology: We have assumed that readers will be familiar with standard terminology from P AC learning. Non-trivial Class: The definition of non-trivial class appears just before the statement of Theorem 5 (in lines 182-183).
they pointed out the importance of the problem and of our analysis given the increasing popularity of computational
We would like to thank the reviewers for their efforts on reading and evaluating our paper . In what follows, we provide specific responses to each reviewer. We thank R1 for a very detailed evaluation and truly helpful feedback on our empirical analysis. The hyperparameter tuning [...] and initialization steps [...] are not explained clearly enough We will move necessary Can't we apply the same strategy in the Frank-W olfe algorithm of SRW [Alg.2, 64]: Y es, we can apply the same Indeed, we made good use of the open source code they have provided to inspire our implementation. Is it possible to extend this result to PRW?
MazeMate: An LLM-Powered Chatbot to Support Computational Thinking in Gamified Programming Learning
Hou, Chenyu, Yu, Hua, Zhu, Gaoxia, Anas, John Derek, Liu, Jiao, Ong, Yew Soon
Computational Thinking (CT) is a foundational problem-solving skill, and gamified programming environments are a widely adopted approach to cultivating it. While large language models (LLMs) provide on-demand programming support, current applications rarely foster CT development. We present MazeMate, an LLM-powered chatbot embedded in a 3D Maze programming game, designed to deliver adaptive, context-sensitive scaffolds aligned with CT processes in maze solving and maze design. We report on the first classroom implementation with 247 undergraduates. Students rated MazeMate as moderately helpful, with higher perceived usefulness for maze solving than for maze design. Thematic analysis confirmed support for CT processes such as decomposition, abstraction, and algorithmic thinking, while also revealing limitations in supporting maze design, including mismatched suggestions and fabricated algorithmic solutions. These findings demonstrate the potential of LLM-based scaffolding to support CT and underscore directions for design refinement to enhance MazeMate usability in authentic classrooms.
- Asia > Singapore (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Instructional Material (1.00)
- Education > Educational Setting (1.00)
- Education > Curriculum > Subject-Specific Education (0.94)
Think Just Enough: Sequence-Level Entropy as a Confidence Signal for LLM Reasoning
We introduce a simple, yet novel entropy-based framework to drive token efficiency in large language models during reasoning tasks. Our approach uses Shannon entropy from token-level logprobs as a confidence signal to enable early stopping, achieving 25-50% computational savings while maintaining task accuracy. Crucially, we demonstrate that entropy-based confidence calibration represents an emergent property of advanced post-training optimization present in modern reasoning models but notably absent in standard instruction-tuned and pre-trained models (Llama 3.3 70B). We show that the entropy threshold to stop reasoning varies from model to model but can be calculated easily in one shot using only a few examples from existing reasoning datasets. Our results indicate that advanced reasoning models often know that they've gotten a correct answer early on, and that this emergent confidence awareness can be exploited to save tokens and reduce latency. The framework demonstrates consistent performance across reasoning-optimized model families with 25-50% computational cost reduction while preserving accuracy, revealing that confidence mechanisms represent a distinguishing characteristic of modern post-trained reasoning systems versus their predecessors.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.72)
Comparative Analysis of STEM and non-STEM Teachers' Needs for Integrating AI into Educational Environments
Riahi, Bahare, Catete, Veronica
There is an increasing imperative to integrate programming platforms within AI frameworks to enhance educational tasks for both teachers and students. However, commonly used platforms such as Code.org, Scratch, and Snap fall short of providing the desired AI features and lack adaptability for interdisciplinary applications. This study explores how educational platforms can be improved by incorporating AI and analytics features to create more effective learning environments across various subjects and domains. We interviewed 8 K-12 teachers and asked their practices and needs while using any block-based programming (BBP) platform in their classes. We asked for their approaches in assessment, course development and expansion of resources, and student monitoring in their classes. Thematic analysis of the interview transcripts revealed both commonalities and differences in the AI tools needed between the STEM and non-STEM groups. Our results indicated advanced AI features that could promote BBP platforms. Both groups stressed the need for integrity and plagiarism checks, AI adaptability, customized rubrics, and detailed feedback in assessments. Non-STEM teachers also emphasized the importance of creative assignments and qualitative assessments. Regarding resource development, both AI tools desired for updating curricula, tutoring libraries, and generative AI features. Non-STEM teachers were particularly interested in supporting creative endeavors, such as art simulations. For student monitoring, both groups prioritized desktop control, daily tracking, behavior monitoring, and distraction prevention tools. Our findings identify specific AI-enhanced features needed by K-12 teachers across various disciplines and lay the foundation for creating more efficient, personalized, and engaging educational experiences.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York > New York County > New York City (0.06)
- North America > United States > North Carolina > Wake County > Raleigh (0.04)
- (7 more...)
- Instructional Material > Course Syllabus & Notes (1.00)
- Research Report > New Finding (0.87)
- Information Technology > Security & Privacy (1.00)
- Education > Educational Setting > K-12 Education (1.00)
- Education > Curriculum > Subject-Specific Education (1.00)
- (2 more...)