Instructional Material
Stochastic Online Greedy Learning with Semi-bandit Feedbacks
The greedy algorithm is extensively studied in the field of combinatorial optimization for decades. In this paper, we address the online learning problem when the input to the greedy algorithm is stochastic with unknown parameters that have to be learned over time. We first propose the greedy regret and null -quasi greedy regret as learning metrics comparing with the performance of offline greedy algorithm. We then propose two online greedy learning algorithms with semi-bandit feedbacks, which use multi-armed bandit and pure exploration bandit policies at each level of greedy learning, one for each of the regret metrics respectively. Both algorithms achieve O (log T) problem-dependent regret bound ( T being the time horizon) for a general class of combinatorial structures and reward functions that allow greedy solutions. We further show that the bound is tight in T and other problem instance parameters.
Personalized Auto-Grading and Feedback System for Constructive Geometry Tasks Using Large Language Models on an Online Math Platform
Lee, Yong Oh, Bang, Byeonghun, Lee, Joohyun, Oh, Sejun
As personalized learning gains increasing attention in mathematics education, there is a growing demand for intelligent systems that can assess complex student responses and provide individualized feedback in real time. In this study, we present a personalized auto-grading and feedback system for constructive geometry tasks, developed using large language models (LLMs) and deployed on the Algeomath platform, a Korean online tool designed for interactive geometric constructions. The proposed system evaluates student-submitted geometric constructions by analyzing their procedural accuracy and conceptual understanding. It employs a prompt-based grading mechanism using GPT-4, where student answers and model solutions are compared through a few-shot learning approach. Feedback is generated based on teacher-authored examples built from anticipated student responses, and it dynamically adapts to the student's problem-solving history, allowing up to four iterative attempts per question. The system was piloted with 79 middle-school students, where LLM-generated grades and feedback were benchmarked against teacher judgments. Grading closely aligned with teachers, and feedback helped many students revise errors and complete multi-step geometry tasks. While short-term corrections were frequent, longer-term transfer effects were less clear. Overall, the study highlights the potential of LLMs to support scalable, teacher-aligned formative assessment in mathematics, while pointing to improvements needed in terminology handling and feedback design.
Boolean Satisfiability via Imitation Learning
Zhang, Zewei, Liu, Huan, Yu, Yuanhao, Chen, Jun, Xu, Xiangyu
We propose ImitSA T, a branching policy for conflict-driven clause learning (CDCL) solvers based on imitation learning for the Boolean satisfiability problem (SA T). Unlike previous methods that predict instance-level signals to improve CDCL branching indirectly, or rely on reinforcement learning and insufficient CDCL information to enhance branching, ImitSA T learns from expert KeyTrace that collapses a full run into the sequence of surviving decisions. Replaying a KeyTrace on the same instance is nearly conflict-free, providing dense decision-level supervision and directly reducing propagations--the dominant contributor to wall-clock time. This prefix-conditioned supervision enables ImitSA T to reproduce high-quality branches without exploration, yielding faster convergence, stable training, and seamless integration into CDCL. Extensive experiments demonstrate that ImitSA T reduces propagation counts and runtime, outperforming state-of-the-art learned approaches. The Boolean satisfiability (SA T) problem is a cornerstone of theoretical computer science and artificial intelligence (Cook, 1971; Karp, 1972). Beyond its foundational role, SA T serves as the computational backbone of numerous applications, including formal verification, planning, and combinatorial optimization. Modern solvers for SA T are dominated by the conflict-driven clause learning (CDCL) framework (Silva & Sakallah, 1996; Biere et al., 2009), which has scaled to industrial benchmarks of immense complexity. A CDCL run interleaves branching, unit propagation, and conflict analysis. Among these components, the branching rule largely determines the search trajectory, while unit propagation often dominates runtime (Zhang & Malik, 2002; Davis et al., 2008; Moskewicz et al., 2001). As a result, more informed branching decisions can translate directly into faster solving.
A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects
Linรฅker, Johan, Osborne, Cailean, Ding, Jennifer, Burtenshaw, Ben
The proliferation of open large language models (LLMs) is fostering a vibrant ecosystem of research and innovation in artificial intelligence (AI). However, the methods of collaboration used to develop open LLMs both before and after their public release have not yet been comprehensively studied, limiting our understanding of how open LLM projects are initiated, organized, and governed as well as what opportunities there are to foster this ecosystem even further. We address this gap through an exploratory analysis of open collaboration throughout the development and reuse lifecycle of open LLMs, drawing on semi-structured interviews with the developers of 14 open LLMs from grassroots projects, research institutes, startups, and Big Tech companies in North America, Europe, Africa, and Asia. We make three key contributions to research and practice. First, collaboration in open LLM projects extends far beyond the LLMs themselves, encompassing datasets, benchmarks, open source frameworks, leaderboards, knowledge sharing and discussion forums, and compute partnerships, among others. Second, open LLM developers have a variety of social, economic, and technological motivations, from democratizing AI access and promoting open science to building regional ecosystems and expanding language representation. Third, the sampled open LLM projects exhibit five distinct organizational models, ranging from single company projects to non-profit-sponsored grassroots projects, which vary in their centralization of control and community engagement strategies used throughout the open LLM lifecycle. We conclude with practical recommendations for stakeholders seeking to support the global community building a more open future for AI.
Cognifying Education: Mapping AI's transformative role in emotional, creative, and collaborative learning
Artificial intelligence (AI) is rapidly reshaping educational practice, challenging long held assumptions about teaching and learning. This article integrates conceptual perspectives from recent books (Genesis by Eric Schmidt, Henry Kissinger and Craig Mundie, CoIntelligence by Ethan Mollick, and The Inevitable by Kevin Kelly) with empirical insights from popular AI podcasts and Anthropic public releases. We examine seven key domains: emotional support, creativity, contextual understanding, student engagement, problem solving, ethics and morality, and collaboration. For each domain, we explore AI capabilities, opportunities for transformative change, and emerging best practices, drawing equally from theoretical analysis and real world observations. Overall, we find that AI, when used thoughtfully, can complement and enhance human educators in fostering richer learning experiences across cognitive, social, and emotional dimensions. We emphasize an optimistic yet responsible outlook: educators and students should actively shape AI integration to amplify human potential in creativity, ethical reasoning, collaboration, and beyond, while maintaining a focus on human centric values.
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
Chen, Yongchao, Liu, Yueying, Zhou, Junwei, Hao, Yilun, Wang, Jingquan, Zhang, Yang, Li, Na, Fan, Chuchu
Practical guidance on training Large Language Models (LLMs) to leverage Code Interpreter across diverse tasks remains lacking. Our final model, R1-CI-14B, improves average accuracy on the 37 test tasks from 44.1% to 72.4%, Notably, R1-CI-14B also exhibits emergent self-checking behavior through code generation. While reinforcement learning (RL)-based fine-tuning has significantly improved LLMs' reasoning and planning Wang In contrast, symbolic code generation handles these rigorously and benefits from external tools (e.g., A key challenge is guiding LLMs to decide when to rely on textual reasoning versus programmatic solutions, given that most input questions lack explicit cues about which approach is best and the possible text/code solution space is large. OpenAI's GPT models address this by incorporating a Code Interpreter, allowing iterative code generation Interpreter implementations struggle to effectively steer between text and code, underutilizing symbolic capabilities. Recent work such as ToRL (Li et al., 2025b) and ReTool (Feng et al., 2025) investigates training reasoning models to integrate with Code Interpreters. To tackle these challenges, we present R1-Code-Interpreter, a framework for integrating Code Interpreter into open-source LLMs. We curate 144 reasoning and planning tasks and synthesize 6.5k multi-turn text/code trajectories for This difficulty arises from task heterogeneity and the scarcity of effective samples.
The Real Stakes, and Real Story, of Peter Thiel's Antichrist Obsession
Thirty years ago, a peace-loving Austrian theologian spoke to Peter Thiel about the apocalyptic theories of Nazi jurist Carl Schmitt. They've been a road map for the billionaire ever since. For a full two years now, the billionaire has been on the circuit, spreading his biblically inflected ideas about doomsday through a set of variably and sometimes visibly perplexed interviewers. He has chatted onstage with the economist podcaster Tyler Cowen about the (the scriptural term for "that which withholds" the end times); traded some very awkward on-camera silences with the New York Times columnist Ross Douthat; and is, at this very moment, in the midst of delivering a four-part, off-the-record lecture series about the Antichrist in San Francisco. Depending on who you are, you may find it hilarious, fascinating, insufferable, or horrifying that one of the world's most powerful men is obsessing over a figure from sermons and horror movies. But the ideas and influences behind these talks are key to understanding how Thiel sees his own massive role in the world--in politics, technology, and the fate of the species. And to really grasp Thiel's katechon-and-Antichrist schtick, you need to go back to the first major lecture of his doomsday road show--which took place on an unusually hot day in Paris in 2023. No video cameras recorded the event, and no reporters wrote about it, but I've been able to reconstruct it by talking to people who were there. The venue was a yearly conference of scholars devoted to Thiel's chief intellectual influence, the late French-American theorist Renรฉ Girard. On the evening of the unpublicized lecture, dozens of Girardian philosophers and theologians from around the world filed into a modest lecture hall at the Catholic University of Paris. And from the dais, Thiel delivered a nearly hourlong account of his thoughts on Armageddon--and all the things he believed were "not enough" to prevent it. By Thiel's telling, the modern world is scared, way too scared, of its own technology. Our "listless" and "zombie" age, he said, is marked by a growing hostility to innovation, plummeting fertility rates, too much yoga, and a culture mired in the "endless Groundhog Day of the worldwide web." But in its neurotic desperation to avoid technological Armageddon--the real threats of nuclear war, environmental catastrophe, runaway AI--modern civilization has become susceptible to something even more dangerous: the Antichrist. According to some Christian traditions, the Antichrist is a figure that will unify humanity under one rule before delivering us to the apocalypse. For Thiel, its evil is pretty much synonymous with any attempt to unite the world. "How might such an Antichrist rise to power?" Thiel asked.
CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning
Zhang, Shijie, Sun, Guohao, Zhang, Kevin, Guo, Xiang, Guo, Rujun
Recently, online Reinforcement Learning with Verifiable Rewards (RLVR) has become a key paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs). However, existing methods typically treat all training samples uniformly, overlooking the vast differences in problem difficulty relative to the model's current capabilities. This uniform training strategy leads to inefficient exploration of problems the model has already mastered, while concurrently lacking effective guidance on problems that are challenging its abilities the most, limiting both learning efficiency and upper-bound performance. To address this, we propose CLPO (Curriculum-guided Learning for Policy Optimization), a novel algorithm that creates a dynamic pedagogical feedback loop within the policy optimization process. The core of CLPO leverages the model's own rollout performance to conduct real-time difficulty assessment, thereby constructing an Online Curriculum. This curriculum then guides an Adaptive Problem Restructuring mechanism, where the model acts as its own teacher: it diversifies medium-difficulty problems to promote generalization and simplifies challenging problems to make them more attainable. Our approach transforms the static training procedure into a dynamic process that co-evolves with the model's capabilities. Experiments show that CLPO achieves state-of-the-art performance across eight challenging mathematical and general reasoning benchmarks, with an average pass@1 improvement of 6.96% over other methods, demonstrating its potential for more efficiently training more capable reasoning models.
Retrieval-augmented GUI Agents with Generative Guidelines
Xu, Ran, Ma, Kaixin, Yu, Wenhao, Zhang, Hongming, Ho, Joyce C., Yang, Carl, Yu, Dong
GUI agents powered by vision-language models (VLMs) show promise in automating complex digital tasks. However, their effectiveness in real-world applications is often limited by scarce training data and the inherent complexity of these tasks, which frequently require long-tailed knowledge covering rare, unseen scenarios. We propose RAG-GUI , a lightweight VLM that leverages web tutorials at inference time. RAG-GUI is first warm-started via supervised finetuning (SFT) and further refined through self-guided rejection sampling finetuning (RSF). Designed to be model-agnostic, RAG-GUI functions as a generic plug-in that enhances any VLM-based agent. Evaluated across three distinct tasks, it consistently outperforms baseline agents and surpasses other inference baselines by 2.6% to 13.3% across two model sizes, demonstrating strong generalization and practical plug-and-play capabilities in real-world scenarios.