Instructional Material
Online Pre-Training for Offline-to-Online Reinforcement Learning
Shin, Yongjae, Kim, Jeonghye, Jung, Whiyoung, Hong, Sunghoon, Yoon, Deunsol, Jang, Youngsoo, Kim, Geonhyeong, Chae, Jongseong, Sung, Youngchul, Lee, Kanghoon, Lim, Woohyung
Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.
Circumventing Safety Alignment in Large Language Models Through Embedding Space Toxicity Attenuation
Zhang, Zhibo, Li, Yuxi, Wang, Kailong, Yuan, Shuai, Shi, Ling, Wang, Haoyu
Large Language Models (LLMs) have achieved remarkable success across domains such as healthcare, education, and cybersecurity. However, this openness also introduces significant security risks, particularly through embedding space poisoning, which is a subtle attack vector where adversaries manipulate the internal semantic representations of input data to bypass safety alignment mechanisms. While previous research has investigated universal perturbation methods, the dynamics of LLM safety alignment at the embedding level remain insufficiently understood. Consequently, more targeted and accurate adversarial perturbation techniques, which pose significant threats, have not been adequately studied. In this work, we propose ETTA (Embedding Transformation Toxicity Attenuation), a novel framework that identifies and attenuates toxicity-sensitive dimensions in embedding space via linear transformations. ETTA bypasses model refusal behaviors while preserving linguistic coherence, without requiring model fine-tuning or access to training data. Evaluated on five representative open-source LLMs using the AdvBench benchmark, ETTA achieves a high average attack success rate of 88.61%, outperforming the best baseline by 11.34%, and generalizes to safety-enhanced models (e.g., 77.39% ASR on instruction-tuned defenses). These results highlight a critical vulnerability in current alignment strategies and underscore the need for embedding-aware defenses.
Prospective Learning in Retrospect
Bai, Yuxin, Shuai, Cecelia, De Silva, Ashwin, Yu, Siyu, Chaudhari, Pratik, Vogelstein, Joshua T.
In most real-world applications of artificial intelligence, the distributions of the data and the goals of the learners tend to change over time. The Probably Approximately Correct (PAC) learning framework, which underpins most machine learning algorithms, fails to account for dynamic data distributions and evolving objectives, often resulting in suboptimal performance. Prospective learning is a recently introduced mathematical framework that overcomes some of these limitations. We build on this framework to present preliminary results that improve the algorithm and numerical results, and extend prospective learning to sequential decision-making scenarios, specifically foraging. Code is available at: https://github.com/neurodata/prolearn2.
Implementation and Assessment of an Augmented Training Curriculum for Surgical Robotics
Rota, Alberto, Fan, Ke, De Momi, Elena
--The integration of high-level assistance algorithms in surgical robotics training curricula may be beneficial in establishing a more comprehensive and robust skillset for aspiring surgeons, improving their clinical performance as a consequence. This work presents the development and validation of a haptic-enhanced Virtual Reality simulator for surgical robotics training, featuring 8 surgical tasks that the trainee can interact with thanks to the embedded physics engine. This virtual simulated environment is augmented by the introduction of high-level haptic interfaces for robotic assistance that aim at re-directing the motion of the trainee's hands and wrists toward targets or away from obstacles, and providing a quantitative performance score after the execution of each training exercise. An experimental study shows that the introduction of enhanced robotic assistance into a surgical robotics training curriculum improves performance during the training process and, crucially, promotes the transfer of the acquired skills to an unassisted surgical scenario, like the clinical one. The increase of surgical robotics procedures in the last decade demands a high number of trained surgeons [1] [2], capable of teleoperating such advanced and complex systems and at the same time able to take advantage of the benefits of Robot-Assisted Minimally Invasive Surgery (RAMIS) safely and effectively.
Adaptive Elicitation of Latent Information Using Natural Language
Wang, Jimmy, Zollo, Thomas, Zemel, Richard, Namkoong, Hongseok
Eliciting information to reduce uncertainty about a latent entity is a critical task in many application domains, e.g., assessing individual student learning outcomes, diagnosing underlying diseases, or learning user preferences. Though natural language is a powerful medium for this purpose, large language models (LLMs) and existing fine-tuning algorithms lack mechanisms for strategically gathering information to refine their own understanding of the latent entity. To harness the generalization power and world knowledge of LLMs in developing effective information-gathering strategies, we propose an adaptive elicitation framework that actively reduces uncertainty on the latent entity. Since probabilistic modeling of an abstract latent entity is difficult, our framework adopts a predictive view of uncertainty, using a meta-learned language model to simulate future observations and enable scalable uncertainty quantification over complex natural language. Through autoregressive forward simulation, our model quantifies how new questions reduce epistemic uncertainty, enabling the development of sophisticated information-gathering strategies to choose the most informative next queries. In experiments on the 20 questions game, dynamic opinion polling, and adaptive student assessment, our method consistently outperforms baselines in identifying critical unknowns and improving downstream predictions, illustrating the promise of strategic information gathering in natural language settings.
Assessing the Prevalence of AI-assisted Cheating in Programming Courses: A Pilot Study
Abstract-- Tools that can generate computer code in response to inputs written in natural language, such as ChatGPT, pose an existential threat to Computer Science education in its current form, since students can now use these tools to solve assignments without much effort. While that risk has already been recognized by scholars, the proportion of the student body that is incurring in this new kind of plagiarism is still an open problem. We conducted a pilot study in a large CS class (n=120) to assess the feasibility of estimating AI plagiarism through anonymous surveys and interviews. More than 25% of the survey respondents admitted to committing AI plagiarism. Conversely, only one student accepted to be interviewed. Given the high levels of misconduct acknowledgment, we conclude that surveys are an effective method for studies on the matter, while interviews should be avoided or designed in a way that can entice participation. 1 INTRODUCTION Generative artificial intelligence (GenAI, not to be confused with general The generation is usually guided by an input text known as the "prompt". For example, giving the prompt "a vase of red flowers" to a GenAI model would generate an image depicting red flowers in a vase. Practical applications of GenAI are now mainstream thanks to advances in neural networks. In particular, the clever use of attention mechanisms and the subsequent development of the transformer architecture made efficient learning possible over large text corpora (Vaswani et al., 2023) . AI application based on a LLM, can convincingly engage in a conversation and answer questions across multiple subjects (OpenAI, 2022) . Research on applications of LLMs in education is still in its infancy, but looks promising. Personal tutoring systems (Chang, 2022), content explanation (Leinonen et al., 2023) and assignment generation ( Jury et al., 2024) are a few of the ideas that have been explored. From another perspective, LLMs are already a reality in schools.
The AI Industry is Funding A Massive AI Training Initiative for Teachers
AI tools have become deeply embedded in how many students learn and complete schoolwork--and that usage is only poised to increase. On Tuesday, the American Federation of Teachers announced an AI training hub for educators, backed by 23 million from Microsoft, OpenAI, and Anthropic. The AFT is the second-largest teachers' union, representing 1.8 million teachers and educational staffers across the country. Their training hub will open in New York City this fall, featuring workshops that will educate teachers on how to use AI tools for tasks like generating lesson plans and quizzes, or writing emails to parents. Microsoft is providing 12.5 million for AI teacher training over the next five years.
Bridging Prediction and Intervention Problems in Social Systems
Liu, Lydia T., Raji, Inioluwa Deborah, Zhou, Angela, Guerdan, Luke, Hullman, Jessica, Malinsky, Daniel, Wilder, Bryan, Zhang, Simone, Adam, Hammaad, Coston, Amanda, Laufer, Ben, Nwankwo, Ezinne, Zanger-Tishler, Michael, Ben-Michael, Eli, Barocas, Solon, Feller, Avi, Gerchick, Marissa, Gillis, Talia, Guha, Shion, Ho, Daniel, Hu, Lily, Imai, Kosuke, Kapoor, Sayash, Loftus, Joshua, Nabi, Razieh, Narayanan, Arvind, Recht, Ben, Perdomo, Juan Carlos, Salganik, Matthew, Sendak, Mark, Tolbert, Alexander, Ustun, Berk, Venkatasubramanian, Suresh, Wang, Angelina, Wilson, Ashia
Many automated decision systems (ADS) are designed to solve prediction problems -- where the goal is to learn patterns from a sample of the population and apply them to individuals from the same population. In reality, these prediction systems operationalize holistic policy interventions in deployment. Once deployed, ADS can shape impacted population outcomes through an effective policy change in how decision-makers operate, while also being defined by past and present interactions between stakeholders and the limitations of existing organizational, as well as societal, infrastructure and context. In this work, we consider the ways in which we must shift from a prediction-focused paradigm to an interventionist paradigm when considering the impact of ADS within social systems. We argue this requires a new default problem setup for ADS beyond prediction, to instead consider predictions as decision support, final decisions, and outcomes. We highlight how this perspective unifies modern statistical frameworks and other tools to study the design, implementation, and evaluation of ADS systems, and point to the research directions necessary to operationalize this paradigm shift. Using these tools, we characterize the limitations of focusing on isolated prediction tasks, and lay the foundation for a more intervention-oriented approach to developing and deploying ADS.
Online Regularized Learning Algorithms in RKHS with $β$- and $ϕ$-Mixing Sequences
Roy, Priyanka, Saminger-Platz, Susanne
In this paper, we study an online regularized learning algorithm in a reproducing kernel Hilbert spaces (RKHS) based on a class of dependent processes. We choose such a process where the degree of dependence is measured by mixing coefficients. As a representative example, we analyze a strictly stationary Markov chain, where the dependence structure is characterized by the \(ϕ\)- and \(β\)-mixing coefficients. Under these assumptions, we derive probabilistic upper bounds as well as convergence rates for both the exponential and polynomial decay of the mixing coefficients.
SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning
Zhang, Ruiqi, Arora, Daman, Mei, Song, Zanette, Andrea
Training large language models with reinforcement learning (RL) against verifiable rewards significantly enhances their reasoning abilities, yet remains computationally expensive due to inefficient uniform prompt sampling. We introduce Selective Prompting with Efficient Estimation of Difficulty (SPEED), an adaptive online RL curriculum that selectively chooses training examples of intermediate difficulty to maximize learning efficiency. Theoretically, we establish that intermediate-difficulty prompts improve the gradient estimator's signal-to-noise ratio, accelerating convergence. Empirically, our efficient implementation leads to 2x to 6x faster training without degrading accuracy, requires no manual tuning, and integrates seamlessly into standard RL algorithms.