Generative AI
Human-in-the-Loop Systems for Adaptive Learning Using Generative AI
Tarun, Bhavishya, Du, Haoze, Kannan, Dinesh, Gehringer, Edward F.
A Human-in-the-Loop (HITL) approach leverages generative AI to enhance personalized learning by directly integrating student feedback into AI-generated solutions. Students critique and modify AI responses using predefined feedback tags, fostering deeper engagement and understanding. This empowers students to actively shape their learning, with AI serving as an adaptive partner. The system uses a tagging technique and prompt engineering to personalize content, informing a Retrieval-Augmented Generation (RAG) system to retrieve relevant educational material and adjust explanations in real time. This builds on existing research in adaptive learning, demonstrating how student-driven feedback loops can modify AI-generated responses for improved student retention and engagement, particularly in STEM education. Preliminary findings from a study with STEM students indicate improved learning outcomes and confidence compared to traditional AI tools. This work highlights AI's potential to create dynamic, feedback-driven, and personalized learning environments through iterative refinement.
Can Multi-modal (reasoning) LLMs detect document manipulation?
Liang, Zisheng, Zewde, Kidus, Singh, Rudra Pratap, Patil, Disha, Chen, Zexi, Xue, Jiayu, Yao, Yao, Chen, Yifei, Liu, Qinzhe, Ren, Simiao
Document fraud poses a significant threat to industries reliant on secure and verifiable documentation, necessitating robust detection mechanisms. This study investigates the efficacy of state-of-the-art multi-modal large language models (LLMs)-including OpenAI O1, OpenAI 4o, Gemini Flash (thinking), Deepseek Janus, Grok, Llama 3.2 and 4, Qwen 2 and 2.5 VL, Mistral Pixtral, and Claude 3.5 and 3.7 Sonnet-in detecting fraudulent documents. We benchmark these models against each other and prior work on document fraud detection techniques using a standard dataset with real transactional documents. Through prompt optimization and detailed analysis of the models' reasoning processes, we evaluate their ability to identify subtle indicators of fraud, such as tampered text, misaligned formatting, and inconsistent transactional sums. Our results reveal that top-performing multi-modal LLMs demonstrate superior zero-shot generalization, outperforming conventional methods on out-of-distribution datasets, while several vision LLMs exhibit inconsistent or subpar performance. Notably, model size and advanced reasoning capabilities show limited correlation with detection accuracy, suggesting task-specific fine-tuning is critical. This study underscores the potential of multi-modal LLMs in enhancing document fraud detection systems and provides a foundation for future research into interpretable and scalable fraud mitigation strategies.
JELAI: Integrating AI and Learning Analytics in Jupyter Notebooks
Torre, Manuel Valle, van der Velden, Thom, Specht, Marcus, Oertel, Catharine
Generative AI offers potential for educational support, but often lacks pedagogical grounding and awareness of the student's learning context. Furthermore, researching student interactions with these tools within authentic learning environments remains challenging. To address this, we present JELAI, an open-source platform architecture designed to integrate fine-grained Learning Analytics (LA) with Large Language Model (LLM)-based tutoring directly within a Jupyter Notebook environment. JELAI employs a modular, containerized design featuring JupyterLab extensions for telemetry and chat, alongside a central mid-dleware handling LA processing and context-aware LLM prompt enrichment. This architecture enables the capture of integrated code interaction and chat data, facilitating real-time, context-sensitive AI scaffolding and research into student behaviour. We describe the system's design, implementation, and demonstrate its feasibility through system performance benchmarks and two proof-of-concept use cases illustrating its capabilities for logging multi-modal data, analysing help-seeking patterns, and supporting A/B testing of AI configurations. JELAI's primary contribution is its technical framework, providing a flexible tool for researchers and educators to develop, deploy, and study LA-informed AI tutoring within the widely used Jupyter ecosystem.
Developers Say GPT-5 Is a Mixed Bag
When OpenAI launched GPT-5 last week, it told software engineers the model was designed to be a "true coding collaborator" that excels at generating high-quality code and performing agentic, or automated, software tasks. While the company didn't say so explicitly, OpenAI appeared to be taking direct aim at Anthropic's Claude Code, which has quickly become many developers' favored tool for AI-assisted coding. But developers tell WIRED that GPT-5 has been a mixed bag so far. It shines at technical reasoning and planning coding tasks, but some say that Anthropic's newest Opus and Sonnet reasoning models still produce better code. Depending on which version of GPT-5 developers are using--low, medium, or high verbosity--the model can be more elaborative, which sometimes leads it to generate unnecessary or redundant lines of code.
Government Documents Show Police Disabling AI Oversight Tools
Once best known for developing the Taser, Axon has transformed into a 50 billion military and law enforcement tech giant.Mother Jones illustration; Michael Nigro/Pacific Press/Zuma; Arthur Ogleznev/Unsplash; Logan Weaver/Unsplash In April 2024, the American police tech firm Axon, which leads the market for police body cameras, released a tool it billed as "revolutionary": Draft One, an AI-powered software package that would turn body camera footage and audio into intelligible police reports. Once best known for developing the Taser, Axon has transformed into a 50 billion military and law enforcement tech giant, providing more than 5,000 police departments across the country with a suite of cloud-based products to manage evidence collection and storage. Draft One, the AI tool, connects with the company's body cameras and evidence storage service to write police reports with little human intervention. At least 21 departments have experimented with the software. The use of artificial intelligence in generating police reports has been particularly troubling, according to civil rights advocacy groups like the Electronic Frontier Foundation and ACLU, because of generative AI's propensity towards racial and gender bias, and its tendency to insert inaccuracies into texts--including wholesale inventions known by technologists as "hallucinations." "I can almost guarantee [AI] reports have been used in plea deals," a police captain wrote.
Sam Altman Says ChatGPT Is on Track to Out-Talk Humanity
Never mind the GPT-5 complaints; Sam Altman says he believes ChatGPT is on track to have more conversations per day than all human beings combined. "If you project our growth forward, pretty soon billions of people a day will be talking to ChatGPT," said the CEO of OpenAI during a dinner with journalists in San Francisco. "ChatGPT will be having more conversations, maybe, than all human words put together, at some point. I think it's unreasonable to expect a single model personality or style to work for all of that." The remarks followed the chaotic launch of a long-awaited new flagship model, GPT-5, which some users felt had a less friendly and supportive personality. As part of the launch, OpenAI stopped offering users access to the prior model, GPT-4o.