Goto

Collaborating Authors

 Instructional Material


Equity in the Use of ChatGPT for the Classroom: A Comparison of the Accuracy and Precision of ChatGPT 3.5 vs. ChatGPT4 with Respect to Statistics and Data Science Exams

arXiv.org Artificial Intelligence

The association of social mobility with a college education has been studied since the early 1950's [1]. Although there are some indications that a college education is not as effective as it once was in helping graduates climb the social ladder [2], it is still the most reliable way of doing so. US News & World Report updated its rankings in 2023 to include social mobility [3], and many institutions of higher education are paying more attention to recruitment of first-generation college students and talented students from disadvantaged backgrounds. With the inclusion of such students in the typical college class comes some important considerations. For example, a student from difficult financial circumstances with an academic background to match the profile of any student an elite institution will have more difficulty paying for textbooks, a laptop, a smartphone, and other items that are almost essential to current college life [2]. As of November 2022, one such item that students from advantaged backgrounds will have access to that those from lower income brackets will not is ChatGPT4 [4]. It currently costs $20 per month for a subscription and has been called a "significant leap forward" compared to ChatGPT3.5 [5], which is free [6]. While use of generative AI is prohibited in some college classrooms, this is hard to police, and many students use it regardless of classroom restrictions [7]. When generative AI is allowed, there is a wide array of platforms from which students can choose.


Introduction to AI Planning

arXiv.org Artificial Intelligence

These are notes for lectures presented at the University of Stuttgart that provide an introduction to key concepts and techniques in AI Planning. Artificial Intelligence Planning, also known as Automated Planning, emerged somewhere in 1966 from the need to give autonomy to a wheeled robot. Since then, it has evolved into a flourishing research and development discipline, often associated with scheduling. Over the decades, various approaches to planning have been developed with characteristics that make them appropriate for specific tasks and applications. Most approaches represent the world as a state within a state transition system; then the planning problem becomes that of searching a path in the state space from the current state to one which satisfies the goals of the user. The notes begin by introducing the state model and move on to exploring classical planning, the foundational form of planning, and present fundamental algorithms for solving such problems. Subsequently, we examine planning as a constraint satisfaction problem, outlining the mapping process and describing an approach to solve such problems. The most extensive section is dedicated to Hierarchical Task Network (HTN) planning, one of the most widely used and powerful planning techniques in the field. The lecture notes end with a bonus chapter on the Planning Domain Definition (PDDL) Language, the de facto standard syntax for representing non-hierarchical planning problems.


Semi-automated analysis of audio-recorded lessons: The case of teachers' engaging messages

arXiv.org Artificial Intelligence

Engaging messages delivered by teachers are a key aspect of the classroom discourse that influences student outcomes. However, improving this communication is challenging due to difficulties in obtaining observations. This study presents a methodology for efficiently extracting actual observations of engaging messages from audio-recorded lessons. We collected 2,477 audio-recorded lessons from 75 teachers over two academic years. Using automatic transcription and keyword-based filtering analysis, we identified and classified engaging messages. This method reduced the information to be analysed by 90%, optimising the time and resources required compared to traditional manual coding. Subsequent descriptive analysis revealed that the most used messages emphasised the future benefits of participating in school activities. In addition, the use of engaging messages decreased as the academic year progressed. This study offers insights for researchers seeking to extract information from teachers' discourse in naturalistic settings and provides useful information for designing interventions to improve teachers' communication strategies. Keywords: Teacher education; Technology; Discourse; Secondary education; Engagement 1. Introduction Teachers' discourse has the power to shape students' outcomes (Caldarella et al., 2023; Howe & Abedin, 2013; Mercer, 2010).


Virtual Agent-Based Communication Skills Training to Facilitate Health Persuasion Among Peers

arXiv.org Artificial Intelligence

Many laypeople are motivated to improve the health behavior of their family or friends but do not know where to start, especially if the health behavior is potentially stigmatizing or controversial. We present an approach that uses virtual agents to coach community-based volunteers in health counseling techniques, such as motivational interviewing, and allows them to practice these skills in role-playing scenarios. We use this approach in a virtual agent-based system to increase COVID-19 vaccination by empowering users to influence their social network. In a between-subjects comparative design study, we test the effects of agent system interactivity and role-playing functionality on counseling outcomes, with participants evaluated by standardized patients and objective judges. We find that all versions are effective at producing peer counselors who score adequately on a standardized measure of counseling competence, and that participants were significantly more satisfied with interactive virtual agents compared to passive viewing of the training material. We discuss design implications for interpersonal skills training systems based on our findings.


Do Tutors Learn from Equity Training and Can Generative AI Assess It?

arXiv.org Artificial Intelligence

Equity is a core concern of learning analytics. However, applications that teach and assess equity skills, particularly at scale are lacking, often due to barriers in evaluating language. Advances in generative AI via large language models (LLMs) are being used in a wide range of applications, with this present work assessing its use in the equity domain. We evaluate tutor performance within an online lesson on enhancing tutors' skills when responding to students in potentially inequitable situations. We apply a mixed-method approach to analyze the performance of 81 undergraduate remote tutors. We find marginally significant learning gains with increases in tutors' self-reported confidence in their knowledge in responding to middle school students experiencing possible inequities from pretest to posttest. Both GPT-4o and GPT-4-turbo demonstrate proficiency in assessing tutors ability to predict and explain the best approach. Balancing performance, efficiency, and cost, we determine that few-shot learning using GPT-4o is the preferred model. This work makes available a dataset of lesson log data, tutor responses, rubrics for human annotation, and generative AI prompts. Future work involves leveling the difficulty among scenarios and enhancing LLM prompts for large-scale grading and assessment.


Deep Learning Model Security: Threats and Defenses

arXiv.org Artificial Intelligence

Deep learning has transformed AI applications but faces critical security challenges, including adversarial attacks, data poisoning, model theft, and privacy leakage. This survey examines these vulnerabilities, detailing their mechanisms and impact on model integrity and confidentiality. Practical implementations, including adversarial examples, label flipping, and backdoor attacks, are explored alongside defenses such as adversarial training, differential privacy, and federated learning, highlighting their strengths and limitations. Advanced methods like contrastive and self-supervised learning are presented for enhancing robustness. The survey concludes with future directions, emphasizing automated defenses, zero-trust architectures, and the security challenges of large AI models. A balanced approach to performance and security is essential for developing reliable deep learning systems.


Envisioning National Resources for Artificial Intelligence Research: NSF Workshop Report

arXiv.org Artificial Intelligence

Workshop Goals This workshop aimed to identify initial challenges and opportunities for national resources for AI research (e.g., compute, data, models, etc.) and to facilitate planning for the envisioned National AI Research Resource (NAIRR). Participants included AI and cyberinfrastructure (CI) experts. Significant Findings 1. AI researchers confront unprecedented scale that goes well beyond generative AI 2. National investments in AI research resources have been insufficient 3. The suboptimal usability of current resources is compromising AI investigation topics 4. The cadence and intensity of AI conference publications is unlike other research areas 5. Better practices for managing local resources are needed 6. Access to AI research resources is very uneven for different institutions 7. There is an opportunity for greater alignment between CI and AI efforts 8. AI research needs warrant unique approaches to CI and to national shared resources Critical Needs Participants identified ten prototypical AI workflows in two major areas with an immediate need for large-scale resources.


HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics

arXiv.org Artificial Intelligence

Advanced applied mathematics problems are underrepresented in existing Large Language Model (LLM) benchmark datasets. To address this, we introduce HARDMath, a dataset inspired by a graduate course on asymptotic methods, featuring challenging applied mathematics problems that require analytical approximation techniques. These problems demand a combination of mathematical reasoning, computational tools, and subjective judgment, making them difficult for LLMs. Our framework auto-generates a large number of problems with solutions validated against numerical ground truths. We evaluate both open- and closed-source LLMs on HARDMath-mini, a sub-sampled test set of 366 problems, as well as on 40 word problems formulated in applied science contexts. Even leading closed-source models like GPT-4 achieve only 43.8% overall accuracy with few-shot Chain-of-Thought prompting, and all models demonstrate significantly lower performance compared to results on existing mathematics benchmark datasets. We additionally conduct a detailed error analysis to gain insights into the failure cases of LLMs. These results demonstrate limitations of current LLM performance on advanced graduate-level applied math problems and underscore the importance of datasets like HARDMath to advance mathematical abilities of LLMs.


Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT

arXiv.org Artificial Intelligence

The role of multiple-choice questions (MCQs) as effective learning tools has been debated in past research. While MCQs are widely used due to their ease in grading, open response questions are increasingly used for instruction, given advances in large language models (LLMs) for automated grading. This study evaluates MCQs effectiveness relative to open-response questions, both individually and in combination, on learning. These activities are embedded within six tutor lessons on advocacy. Using a posttest-only randomized control design, we compare the performance of 234 tutors (790 lesson completions) across three conditions: MCQ only, open response only, and a combination of both. We find no significant learning differences across conditions at posttest, but tutors in the MCQ condition took significantly less time to complete instruction. These findings suggest that MCQs are as effective, and more efficient, than open response tasks for learning when practice time is limited. To further enhance efficiency, we autograded open responses using GPT-4o and GPT-4-turbo. GPT models demonstrate proficiency for purposes of low-stakes assessment, though further research is needed for broader use. This study contributes a dataset of lesson log data, human annotation rubrics, and LLM prompts to promote transparency and reproducibility.


DK-PRACTICE: An Intelligent Educational Platform for Personalized Learning Content Recommendations Based on Students Knowledge State

arXiv.org Artificial Intelligence

This study introduces DK-PRACTICE (Dynamic Knowledge Prediction and Educational Content Recommendation System), an intelligent online platform that leverages machine learning to provide personalized learning recommendations based on student knowledge state. Students participate in a short, adaptive assessment using the question-and-answer method regarding key concepts in a specific knowledge domain. The system dynamically selects the next question for each student based on the correctness and accuracy of their previous answers. After the test is completed, DK-PRACTICE analyzes students' interaction history to recommend learning materials to empower the student's knowledge state in identified knowledge gaps. Both question selection and learning material recommendations are based on machine learning models trained using anonymized data from a real learning environment. To provide self-assessment and monitor learning progress, DK-PRACTICE allows students to take two tests: one pre-teaching and one post-teaching. After each test, a report is generated with detailed results. In addition, the platform offers functions to visualize learning progress based on recorded test statistics. DK-PRACTICE promotes adaptive and personalized learning by empowering students with self-assessment capabilities and providing instructors with valuable information about students' knowledge levels. DK-PRACTICE can be extended to various educational environments and knowledge domains, provided the necessary data is available according to the educational topics. A subsequent paper will present the methodology for the experimental application and evaluation of the platform.