Goto

Collaborating Authors

 Generative AI


Integrating Universal Generative AI Platforms in Educational Labs to Foster Critical Thinking and Digital Literacy

arXiv.org Artificial Intelligence

This paper presents a new educational framework for integrating generative artificial intelligence (GenAI) platforms such as ChatGPT, Claude, and Gemini into laboratory activities aimed at developing critical thinking and digital literacy among undergraduate students. Recognizing the limitations and risks of uncritical reliance on large language models (LLMs), the proposed pedagogical model reframes GenAI as a research subject and cognitive tool. Students formulate discipline-specific prompts and evaluate GenAI-generated responses in text, image, and video modalities. A pilot implementation in a general astronomy course for non-science majors demonstrated high levels of engagement and critical reflection, with many students continuing the activity after class and presenting results at a research symposium. The results highlight the importance of structured AI interactions in education and suggest that GenAI can improve learning outcomes when combined with reflective assessment methods. The study proposes a replicable model for interdisciplinary AI-integrated lab work, adaptable to scientific disciplines. See the guide to learning activities based on Generative-Ai platforms: https://doi.org/10.5281/zenodo.15555802


Benchmarking the Pedagogical Knowledge of Large Language Models

arXiv.org Artificial Intelligence

Benchmarks like Massive Multitask Language Understanding (MMLU) have played a pivotal role in evaluating AI's knowledge and abilities across diverse domains. However, existing benchmarks predominantly focus on content knowledge, leaving a critical gap in assessing models' understanding of pedagogy - the method and practice of teaching. This paper introduces The Pedagogy Benchmark, a novel dataset designed to evaluate large language models on their Cross-Domain Pedagogical Knowledge (CDPK) and Special Education Needs and Disability (SEND) pedagogical knowledge. These benchmarks are built on a carefully curated set of questions sourced from professional development exams for teachers, which cover a range of pedagogical subdomains such as teaching strategies and assessment methods. Here we outline the methodology and development of these benchmarks. We report results for 97 models, with accuracies spanning a range from 28% to 89% on the pedagogical knowledge questions. We consider the relationship between cost and accuracy and chart the progression of the Pareto value frontier over time. We provide online leaderboards at https://rebrand.ly/pedagogy which are updated with new models and allow interactive exploration and filtering based on various model properties, such as cost per token and open-vs-closed weights, as well as looking at performance in different subjects. LLMs and generative AI have tremendous potential to influence education and help to address the global learning crisis. Education-focused benchmarks are crucial to measure models' capacities to understand pedagogical concepts, respond appropriately to learners' needs, and support effective teaching practices across diverse contexts. They are needed for informing the responsible and evidence-based deployment of LLMs and LLM-based tools in educational settings, and for guiding both development and policy decisions.


Here's What Mark Zuckerberg Is Offering Top AI Talent

WIRED

As Mark Zuckerberg staffs up Meta's new superintelligence lab, he's offering top research talent pay packages of up to 300 million over four years, with more than 100 million in total compensation for the first year, WIRED has learned. Meta has made at least 10 of these staggeringly high offers to OpenAI staffers. One high-ranking researcher was pitched on the role of chief scientist but turned it down, according to multiple sources with direct knowledge of the negotiations. While the pay package includes equity, in the first year the stock vests immediately, sources say. "That's about how much it would take for me to go work at Meta," says one OpenAI staffer who spoke with WIRED on the condition of anonymity as they aren't authorized to speak publicly about the company.


Sam Altman Slams Meta's AI Talent Poaching Spree: 'Missionaries Will Beat Mercenaries'

WIRED

OpenAI CEO Sam Altman is hitting back at Meta CEO Mark Zuckerberg's recent AI talent poaching spree. In a full-throated response sent to OpenAI researchers Monday evening and obtained by WIRED, Altman made his pitch for why staying at OpenAI is the only answer for those looking to build artificial general intelligence, hinting that the company is evaluating compensation for the entire research organization. He also dismissed Meta's recruiting efforts, saying what the company is doing could lead to deep cultural problems down the road. "We have gone from some nerds in the corner to the most interesting people in the tech industry (at least)," he wrote on Slack. "Al Twitter is toxic; Meta is acting in a way that feels somewhat distasteful; I assume things will get even crazier in the future. After I got fired and came back I said that was not the craziest thing that would happen in OpenAl history; certainly neither is this."


Towards Time Series Generation Conditioned on Unstructured Natural Language

arXiv.org Artificial Intelligence

Generative Artificial Intelligence (AI) has rapidly become a powerful tool, capable of generating various types of data, such as images and text. However, despite the significant advancement of generative AI, time series generative AI remains underdeveloped, even though the application of time series is essential in finance, climate, and numerous fields. In this research, we propose a novel method of generating time series conditioned on unstructured natural language descriptions. We use a diffusion model combined with a language model to generate time series from the text. Through the proposed method, we demonstrate that time series generation based on natural language is possible. The proposed method can provide various applications such as custom forecasting, time series manipulation, data augmentation, and transfer learning. Furthermore, we construct and propose a new public dataset for time series generation, consisting of 63,010 time series-description pairs.


Innovative Research on IoT Architecture and Robotic Operating Platforms: Applications of Large Language Models and Generative AI

arXiv.org Artificial Intelligence

This paper introduces an innovative design for robotic operating platforms, underpinned by a transformative Internet of Things (IoT) architecture, seamlessly integrating cutting-edge technologies such as large language models (LLMs), generative AI, edge computing, and 5G networks. The proposed platform aims to elevate the intelligence and autonomy of IoT systems and robotics, enabling them to make real-time decisions and adapt dynamically to changing environments. Through a series of compelling case studies across industries including smart manufacturing, healthcare, and service sectors, this paper demonstrates the substantial potential of IoT-enabled robotics to optimize operational workflows, enhance productivity, and deliver innovative, scalable solutions. By emphasizing the roles of LLMs and generative AI, the research highlights how these technologies drive the evolution of intelligent robotics and IoT, shaping the future of industry-specific advancements. The findings not only showcase the transformative power of these technologies but also offer a forward-looking perspective on their broader societal and industrial implications, positioning them as catalysts for next-generation automation and technological convergence.


Computational Analysis of Climate Policy

arXiv.org Artificial Intelligence

This thesis explores the impact of the Climate Emergency movement on local government climate policy, using computational methods. The Climate Emergency movement sought to accelerate climate action at local government level through the mechanism of Climate Emergency Declarations (CEDs), resulting in a series of commitments from councils to treat climate change as an emergency. With the aim of assessing the potential of current large language models to answer complex policy questions, I first built and configured a system named PALLM (Policy Analysis with a Large Language Model), using the OpenAI model GPT-4. This system is designed to apply a conceptual framework for climate emergency response plans to a dataset of climate policy documents. I validated the performance of this system with the help of local government policymakers, by generating analyses of the climate policies of 11 local governments in Victoria and assessing the policymakers' level of agreement with PALLM's responses. Having established that PALLM's performance is satisfactory, I used it to conduct a large-scale analysis of current policy documents from local governments in the state of Victoria, Australia. This thesis presents the methodology and results of this analysis, comparing the results for councils which have passed a CED to those which did not. This study finds that GPT-4 is capable of high-level policy analysis, with limitations including a lack of reliable attribution, and can also enable more nuanced analysis by researchers. Its use in this research shows that councils which have passed a CED are more likely to have a recent and climate-specific policy, and show more attention to urgency, prioritisation, and equity and social justice, than councils which have not. It concludes that the ability to assess policy documents at scale opens up exciting new opportunities for policy researchers.


From Model Design to Organizational Design: Complexity Redistribution and Trade-Offs in Generative AI

arXiv.org Artificial Intelligence

We argue that viewing AI as a simple reduction in input costs overlooks two critical dynamics: (a) the inherent trade-offs among generality, accuracy, and simplicity, and (b) the redistribution of complexity across stakeholders. While LLMs appear to defy the traditional trade-off by offering high generality and accuracy through simple interfaces, this user-facing simplicity masks a significant shift of complexity to infrastructure, compliance, and specialized personnel. The GAS trade-off, therefore, does not disappear but is relocated from the user to the organization, creating new managerial challenges, particularly around accuracy in high-stakes applications. We contend that competitive advantage no longer stems from mere AI adoption, but from mastering this redistributed complexity through the design of abstraction layers, workflow alignment, and complementary expertise.


Improving Student-AI Interaction Through Pedagogical Prompting: An Example in Computer Science Education

arXiv.org Artificial Intelligence

With the proliferation of large language model (LLM) applications since 2022, their use in education has sparked both excitement and concern. Recent studies consistently highlight students' (mis)use of LLMs can hinder learning outcomes. This work aims to teach students how to effectively prompt LLMs to improve their learning. We first proposed pedagogical prompting, a theoretically-grounded new concept to elicit learning-oriented responses from LLMs. To move from concept design to a proof-of-concept learning intervention in real educational settings, we selected early undergraduate CS education (CS1/CS2) as the example context. We began with a formative survey study with instructors (N=36) teaching early-stage undergraduate-level CS courses to inform the instructional design based on classroom needs. Based on their insights, we designed and developed a learning intervention through an interactive system with scenario-based instruction to train pedagogical prompting skills. Finally, we evaluated its instructional effectiveness through a user study with CS novice students (N=22) using pre/post-tests. Through mixed methods analyses, our results indicate significant improvements in learners' LLM-based pedagogical help-seeking skills, along with positive attitudes toward the system and increased willingness to use pedagogical prompts in the future. Our contributions include (1) a theoretical framework of pedagogical prompting; (2) empirical insights into current instructor attitudes toward pedagogical prompting; and (3) a learning intervention design with an interactive learning tool and scenario-based instruction leading to promising results on teaching LLM-based help-seeking. Our approach is scalable for broader implementation in classrooms and has the potential to be integrated into tools like ChatGPT as an on-boarding experience to encourage learning-oriented use of generative AI.


AI Risk-Management Standards Profile for General-Purpose AI (GPAI) and Foundation Models

arXiv.org Artificial Intelligence

Increasingly multi-purpose AI models, such as cutting-edge large language models or other 'general-purpose AI' (GPAI) models, 'foundation models,' generative AI models, and 'frontier models' (typically all referred to hereafter with the umbrella term 'GPAI/foundation models' except where greater specificity is needed), can provide many beneficial capabilities but also risks of adverse events with profound consequences. This document provides risk-management practices or controls for identifying, analyzing, and mitigating risks of GPAI/foundation models. We intend this document primarily for developers of large-scale, state-of-the-art GPAI/foundation models; others that can benefit from this guidance include downstream developers of end-use applications that build on a GPAI/foundation model. This document facilitates conformity with or use of leading AI risk management-related standards, adapting and building on the generic voluntary guidance in the NIST AI Risk Management Framework and ISO/IEC 23894, with a focus on the unique issues faced by developers of GPAI/foundation models.