Goto

Collaborating Authors

 Generative AI


Beyond Automation: Socratic AI, Epistemic Agency, and the Implications of the Emergence of Orchestrated Multi-Agent Learning Architectures

arXiv.org Artificial Intelligence

Generative AI is no longer a peripheral tool in higher education. It is rapidly evolving into a general-purpose infrastructure that reshapes how knowledge is generated, mediated, and validated. This paper presents findings from a controlled experiment evaluating a Socratic AI Tutor, a large language model designed to scaffold student research question development through structured dialogue grounded in constructivist theory. Conducted with 65 pre-service teacher students in Germany, the study compares interaction with the Socratic Tutor to engagement with an uninstructed AI chatbot. Students using the Socratic Tutor reported significantly greater support for critical, independent, and reflective thinking, suggesting that dialogic AI can stimulate metacognitive engagement and challenging recent narratives of de-skilling due to generative AI usage. These findings serve as a proof of concept for a broader pedagogical shift: the use of multi-agent systems (MAS) composed of specialised AI agents. To conceptualise this, we introduce the notion of orchestrated MAS, modular, pedagogically aligned agent constellations, curated by educators, that support diverse learning trajectories through differentiated roles and coordinated interaction. To anchor this shift, we propose an adapted offer-and-use model, in which students appropriate instructional offers from these agents. Beyond technical feasibility, we examine system-level implications for higher education institutions and students, including funding necessities, changes to faculty roles, curriculars, competencies and assessment practices. We conclude with a comparative cost-effectiveness analysis highlighting the scalability of such systems. In sum, this study contributes both empirical evidence and a conceptual roadmap for hybrid learning ecosystems that embed human-AI co-agency and pedagogical alignment.


The New ChatGPT Resets the AI Race

The Atlantic - Technology

Yesterday evening, Sam Altman shared an image of the Death Star on X. There was no caption on the picture, which showed the world-destroying Star Wars space station rising over an Earth-like planet, but his audience understood the context. In fewer than 24 hours, OpenAI would release an AI model intended to wipe out all the rest. That model, GPT-5, launched earlier today with all the requisite fanfare. In an announcement video, Altman said that the product will serve as a "legitimate Ph.D.-level expert in anything--any area you need, on demand--that can help you with whatever your goals are."


OpenAI debuts GPT-5, paving the way for an even smarter ChatGPT

PCWorld

OpenAI on Thursday announced GPT-5, the foundational model for the next generation of ChatGPT and a "significant leap in intelligence," according to the company. GPT-5 will be released in two versions, a "pro" model which will only be accessible to paid subscribers, and the basic GPT-5, which will actually be available to everyone, even those on ChatGPT's free plan. Plus subscribers will get more usage, OpenAI said. Interestingly, OpenAI won't explicitly make which model ChatGPT is using available to ChatGPT users. An interesting feature of the new GPT-5 is a "real-time router" which will assign a model based on the query the user asks.


OpenAI claims GPT-5 model boosts ChatGPT to 'PhD level'

BBC News

OpenAI has highlighted GPT-5's ability to create software in its entirety and demonstrate better reasoning capabilities - with answers that show workings, logic and inference. The company claims it has been trained to be more honest, provide users with more accurate responses and says that, overall, it feels more human. According to Altman, the model is "significantly better" than its predecessors. "GPT-3 sort of felt to me like talking to a high school student... 4 felt like you're kind of talking to a college student," he said in a briefing ahead of Thursday's launch. "GPT-5 is the first time that it really feels like talking to an expert in any topic, like a PhD-level expert."


OpenAI says latest ChatGPT upgrade is big step forward but still can't do humans' jobs

The Guardian

OpenAI has claimed to have taken a "significant step" towards artificial general intelligence (AGI) with the launch of its latest upgrade to ChatGPT, but has admitted there are still "many things" missing in its quest to create a system able to do humans' jobs. The startup said its GPT-5 model, the underlying technology that will power its breakthrough AI chatbot, represents a big upgrade on its predecessors in areas such as coding and creative writing โ€“ and is also a lot less sycophantic. It said the upgrade was being made available to all of ChatGPT's 700 million weekly users immediately. Sam Altman, OpenAI's chief executive, called the model a "significant step forward" to achieving the theoretical state of AGI, which the startup defines as a highly autonomous system that outperforms humans at most economically valuable work โ€“ or, in other words, can do their jobs. However, Altman admitted GPT-5 had not reached that goal yet.


OpenAI Finally Launched GPT-5. Here's Everything You Need to Know

WIRED

OpenAI has begun rolling out GPT-5, the latest iteration of its flagship language model, to all ChatGPT users. The company's CEO Sam Altman called GPT-5 "a significant step along the path to AGI" during a press briefing on Wednesday. While he stopped short of claiming the model reaches artificial general intelligence, Altman noted the latest release is "clearly a model that is generally intelligent." He added that GPT-5 still lacks key traits that would make it reach AGI, a notably loose term that is defined in OpenAI's charter as "a highly autonomous system that outperforms humans at most economically valuable work." For example, the model still lacks the ability to learn continuously after deployment.


Fine-tuning for Better Few Shot Prompting: An Empirical Comparison for Short Answer Grading

arXiv.org Artificial Intelligence

Research to improve Automated Short Answer Grading has recently focused on Large Language Models (LLMs) with prompt engineering and no- or few-shot prompting to achieve best results. This is in contrast to the fine-tuning approach, which has historically required large-scale compute clusters inaccessible to most users. New closed-model approaches such as OpenAI's fine-tuning service promise results with as few as 100 examples, while methods using open weights such as quantized low-rank adaptive (QLORA) can be used to fine-tune models on consumer GPUs. We evaluate both of these fine-tuning methods, measuring their interaction with few-shot prompting for automated short answer grading (ASAG) with structured (JSON) outputs. Our results show that finetuning with small amounts of data has limited utility for Llama open-weight models, but that fine-tuning methods can outperform few-shot baseline instruction-tuned LLMs for OpenAI's closed models. While our evaluation set is limited, we find some evidence that the observed benefits of finetuning may be impacted by the domain subject matter. Lastly, we observed dramatic improvement with the LLama 3.1 8B-Instruct open-weight model by seeding the initial training examples with a significant amount of cheaply generated synthetic training data.


Personalized Knowledge Transfer Through Generative AI: Contextualizing Learning to Individual Career Goals

arXiv.org Artificial Intelligence

As artificial intelligence becomes increasingly integrated into digital learning environments, the personalization of learning content to reflect learners' individual career goals offers promising potential to enhance engagement and long-term motivation. In our study, we investigate how career goal-based content adaptation in learning systems based on generative AI (GenAI) influences learner engagement, satisfaction, and study efficiency. The mixed-methods experiment involved more than 4,000 learners, with one group receiving learning scenarios tailored to their career goals and a control group. Quantitative results show increased session duration, higher satisfaction ratings, and a modest reduction in study duration compared to standard content. Qualitative analysis highlights that learners found the personalized material motivating and practical, enabling deep cognitive engagement and strong identification with the content. These findings underscore the value of aligning educational content with learners' career goals and suggest that scalable AI personalization can bridge academic knowledge and workplace applicability.


Automated Generation of Curriculum-Aligned Multiple-Choice Questions for Malaysian Secondary Mathematics Using Generative AI

arXiv.org Artificial Intelligence

This paper addresses the critical need for scalable and high-quality educational assessment tools within the Malaysian education system. It highlights the potential of Generative AI (GenAI) while acknowledging the significant challenges of ensuring factual accuracy and curriculum alignment, especially for low-resource languages like Bahasa Melayu. This research introduces and compares four incremental pipelines for generating Form 1 Mathematics multiple-choice questions (MCQs) in Bahasa Melayu using OpenAI's GPT-4o. The methods range from non-grounded prompting (structured and basic) to Retrieval-Augmented Generation (RAG) approaches (one using the LangChain framework, one implemented manually). The system is grounded in official curriculum documents, including teacher-prepared notes and the yearly teaching plan (RPT). A dual-pronged automated evaluation framework is employed to assess the generated questions. Curriculum alignment is measured using Semantic Textual Similarity (STS) against the RPT, while contextual validity is verified through a novel RAG-based Question-Answering (RAG-QA) method. The results demonstrate that RAG-based pipelines significantly outperform non-grounded prompting methods, producing questions with higher curriculum alignment and factual validity. The study further analyzes the trade-offs between the ease of implementation of framework-based RAG and the fine-grained control offered by a manual pipeline. This work presents a validated methodology for generating curriculum-specific educational content in a low-resource language, introduces a symbiotic RAG-QA evaluation technique, and provides actionable insights for the development and deployment of practical EdTech solutions in Malaysia and similar regions.


Data and AI governance: Promoting equity, ethics, and fairness in large language models

arXiv.org Artificial Intelligence

In this paper, we cover approaches to systematically govern, assess and quantify bias across the complete life cycle of machine learning models, from initial development and validation to ongoing production monitoring and guardrail implementation. Building upon our foundational work on the Bias Evaluation and Assessment Test Suite (BEATS) for Large Language Models, the authors share prevalent bias and fairness related gaps in Large Language Models (LLMs) and discuss data and AI governance framework to address Bias, Ethics, Fairness, and Factuality within LLMs. The data and AI governance approach discussed in this paper is suitable for practical, real-world applications, enabling rigorous benchmarking of LLMs prior to production deployment, facilitating continuous real-time evaluation, and proactively governing LLM generated responses. By implementing the data and AI governance across the life cycle of AI development, organizations can significantly enhance the safety and responsibility of their GenAI systems, effectively mitigating risks of discrimination and protecting against potential reputational or brand-related harm. Ultimately, through this article, we aim to contribute to advancement of the creation and deployment of socially responsible and ethically aligned generative artificial intelligence powered applications.