AITopics | Instructional Material

Collaborating Authors

Instructional Material

Online Pre-Training for Offline-to-Online Reinforcement Learning

Shin, Yongjae, Kim, Jeonghye, Jung, Whiyoung, Hong, Sunghoon, Yoon, Deunsol, Jang, Youngsoo, Kim, Geonhyeong, Chae, Jongseong, Sung, Youngchul, Lee, Kanghoon, Lim, Woohyung

arXiv.org Artificial IntelligenceJul-14-2025

Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.

machine learning, online pre-training, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2507.08387

Genre:

Research Report > New Finding (1.00)
Instructional Material > Online (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Circumventing Safety Alignment in Large Language Models Through Embedding Space Toxicity Attenuation

Zhang, Zhibo, Li, Yuxi, Wang, Kailong, Yuan, Shuai, Shi, Ling, Wang, Haoyu

arXiv.org Artificial IntelligenceJul-14-2025

Large Language Models (LLMs) have achieved remarkable success across domains such as healthcare, education, and cybersecurity. However, this openness also introduces significant security risks, particularly through embedding space poisoning, which is a subtle attack vector where adversaries manipulate the internal semantic representations of input data to bypass safety alignment mechanisms. While previous research has investigated universal perturbation methods, the dynamics of LLM safety alignment at the embedding level remain insufficiently understood. Consequently, more targeted and accurate adversarial perturbation techniques, which pose significant threats, have not been adequately studied. In this work, we propose ETTA (Embedding Transformation Toxicity Attenuation), a novel framework that identifies and attenuates toxicity-sensitive dimensions in embedding space via linear transformations. ETTA bypasses model refusal behaviors while preserving linguistic coherence, without requiring model fine-tuning or access to training data. Evaluated on five representative open-source LLMs using the AdvBench benchmark, ETTA achieves a high average attack success rate of 88.61%, outperforming the best baseline by 11.34%, and generalizes to safety-enhanced models (e.g., 77.39% ASR on instruction-tuned defenses). These results highlight a critical vulnerability in current alignment strategies and underscore the need for embedding-aware defenses.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.0802

Country:

Asia (0.93)
North America > United States > California (0.28)

Genre:

Research Report (1.00)
Instructional Material (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Prospective Learning in Retrospect

Bai, Yuxin, Shuai, Cecelia, De Silva, Ashwin, Yu, Siyu, Chaudhari, Pratik, Vogelstein, Joshua T.

arXiv.org Machine LearningJul-11-2025

In most real-world applications of artificial intelligence, the distributions of the data and the goals of the learners tend to change over time. The Probably Approximately Correct (PAC) learning framework, which underpins most machine learning algorithms, fails to account for dynamic data distributions and evolving objectives, often resulting in suboptimal performance. Prospective learning is a recently introduced mathematical framework that overcomes some of these limitations. We build on this framework to present preliminary results that improve the algorithm and numerical results, and extend prospective learning to sequential decision-making scenarios, specifically foraging. Code is available at: https://github.com/neurodata/prolearn2.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2507.07965

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.50)
Instructional Material (0.47)

Industry: Education > Educational Setting (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Implementation and Assessment of an Augmented Training Curriculum for Surgical Robotics

Rota, Alberto, Fan, Ke, De Momi, Elena

arXiv.org Artificial IntelligenceJul-11-2025

--The integration of high-level assistance algorithms in surgical robotics training curricula may be beneficial in establishing a more comprehensive and robust skillset for aspiring surgeons, improving their clinical performance as a consequence. This work presents the development and validation of a haptic-enhanced Virtual Reality simulator for surgical robotics training, featuring 8 surgical tasks that the trainee can interact with thanks to the embedded physics engine. This virtual simulated environment is augmented by the introduction of high-level haptic interfaces for robotic assistance that aim at re-directing the motion of the trainee's hands and wrists toward targets or away from obstacles, and providing a quantitative performance score after the execution of each training exercise. An experimental study shows that the introduction of enhanced robotic assistance into a surgical robotics training curriculum improves performance during the training process and, crucially, promotes the transfer of the acquired skills to an unassisted surgical scenario, like the clinical one. The increase of surgical robotics procedures in the last decade demands a high number of trained surgeons [1] [2], capable of teleoperating such advanced and complex systems and at the same time able to take advantage of the benefits of Robot-Assisted Minimally Invasive Surgery (RAMIS) safely and effectively.

artificial intelligence, assistance, human computer interaction, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICRA57147.2024.10610411

2507.07718

Country: Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)
Education (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Adaptive Elicitation of Latent Information Using Natural Language

Wang, Jimmy, Zollo, Thomas, Zemel, Richard, Namkoong, Hongseok

arXiv.org Artificial IntelligenceJul-10-2025

Eliciting information to reduce uncertainty about a latent entity is a critical task in many application domains, e.g., assessing individual student learning outcomes, diagnosing underlying diseases, or learning user preferences. Though natural language is a powerful medium for this purpose, large language models (LLMs) and existing fine-tuning algorithms lack mechanisms for strategically gathering information to refine their own understanding of the latent entity. To harness the generalization power and world knowledge of LLMs in developing effective information-gathering strategies, we propose an adaptive elicitation framework that actively reduces uncertainty on the latent entity. Since probabilistic modeling of an abstract latent entity is difficult, our framework adopts a predictive view of uncertainty, using a meta-learned language model to simulate future observations and enable scalable uncertainty quantification over complex natural language. Through autoregressive forward simulation, our model quantifies how new questions reduce epistemic uncertainty, enabling the development of sophisticated information-gathering strategies to choose the most informative next queries. In experiments on the 20 questions game, dynamic opinion polling, and adaptive student assessment, our method consistently outperforms baselines in identifying critical unknowns and improving downstream predictions, illustrating the promise of strategic information gathering in natural language settings.

large language model, latent entity, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2504.04204

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.48)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Assessing the Prevalence of AI-assisted Cheating in Programming Courses: A Pilot Study

Delphino, Kaléu

arXiv.org Artificial IntelligenceJul-10-2025

Abstract-- Tools that can generate computer code in response to inputs written in natural language, such as ChatGPT, pose an existential threat to Computer Science education in its current form, since students can now use these tools to solve assignments without much effort. While that risk has already been recognized by scholars, the proportion of the student body that is incurring in this new kind of plagiarism is still an open problem. We conducted a pilot study in a large CS class (n=120) to assess the feasibility of estimating AI plagiarism through anonymous surveys and interviews. More than 25% of the survey respondents admitted to committing AI plagiarism. Conversely, only one student accepted to be interviewed. Given the high levels of misconduct acknowledgment, we conclude that surveys are an effective method for studies on the matter, while interviews should be avoided or designed in a way that can entice participation. 1 INTRODUCTION Generative artificial intelligence (GenAI, not to be confused with general The generation is usually guided by an input text known as the "prompt". For example, giving the prompt "a vase of red flowers" to a GenAI model would generate an image depicting red flowers in a vase. Practical applications of GenAI are now mainstream thanks to advances in neural networks. In particular, the clever use of attention mechanisms and the subsequent development of the transformer architecture made efficient learning possible over large text corpora (Vaswani et al., 2023) . AI application based on a LLM, can convincingly engage in a conversation and answer questions across multiple subjects (OpenAI, 2022) . Research on applications of LLMs in education is still in its infancy, but looks promising. Personal tutoring systems (Chang, 2022), content explanation (Leinonen et al., 2023) and assignment generation ( Jury et al., 2024) are a few of the ideas that have been explored. From another perspective, LLMs are already a reality in schools.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2507.06438

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)
Personal > Interview (0.93)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Education > Educational Setting (0.93)
Education > Curriculum > Subject-Specific Education (0.49)
Education > Educational Technology > Educational Software (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.69)

Add feedback

The AI Industry is Funding A Massive AI Training Initiative for Teachers

TIME - TechJul-9-2025, 18:01:00 GMT

AI tools have become deeply embedded in how many students learn and complete schoolwork--and that usage is only poised to increase. On Tuesday, the American Federation of Teachers announced an AI training hub for educators, backed by 23 million from Microsoft, OpenAI, and Anthropic. The AFT is the second-largest teachers' union, representing 1.8 million teachers and educational staffers across the country. Their training hub will open in New York City this fall, featuring workshops that will educate teachers on how to use AI tools for tasks like generating lesson plans and quizzes, or writing emails to parents. Microsoft is providing 12.5 million for AI teacher training over the next five years.

ai tool, massive ai training initiative, student, (15 more...)

TIME - Tech

Country:

North America > United States > New York (0.25)
North America > United States > Florida > Miami-Dade County (0.05)

Genre: Instructional Material (0.57)

Industry:

Education > Finance > Teachers Union (0.57)
Education > Educational Setting > Online (0.57)
Government > Regional Government > North America Government > United States Government (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.62)

Add feedback

Bridging Prediction and Intervention Problems in Social Systems

Liu, Lydia T., Raji, Inioluwa Deborah, Zhou, Angela, Guerdan, Luke, Hullman, Jessica, Malinsky, Daniel, Wilder, Bryan, Zhang, Simone, Adam, Hammaad, Coston, Amanda, Laufer, Ben, Nwankwo, Ezinne, Zanger-Tishler, Michael, Ben-Michael, Eli, Barocas, Solon, Feller, Avi, Gerchick, Marissa, Gillis, Talia, Guha, Shion, Ho, Daniel, Hu, Lily, Imai, Kosuke, Kapoor, Sayash, Loftus, Joshua, Nabi, Razieh, Narayanan, Arvind, Recht, Ben, Perdomo, Juan Carlos, Salganik, Matthew, Sendak, Mark, Tolbert, Alexander, Ustun, Berk, Venkatasubramanian, Suresh, Wang, Angelina, Wilson, Ashia

arXiv.org Machine LearningJul-9-2025

Many automated decision systems (ADS) are designed to solve prediction problems -- where the goal is to learn patterns from a sample of the population and apply them to individuals from the same population. In reality, these prediction systems operationalize holistic policy interventions in deployment. Once deployed, ADS can shape impacted population outcomes through an effective policy change in how decision-makers operate, while also being defined by past and present interactions between stakeholders and the limitations of existing organizational, as well as societal, infrastructure and context. In this work, we consider the ways in which we must shift from a prediction-focused paradigm to an interventionist paradigm when considering the impact of ADS within social systems. We argue this requires a new default problem setup for ADS beyond prediction, to instead consider predictions as decision support, final decisions, and outcomes. We highlight how this perspective unifies modern statistical frameworks and other tools to study the design, implementation, and evaluation of ADS systems, and point to the research directions necessary to operationalize this paradigm shift. Using these tools, we characterize the limitations of focusing on isolated prediction tasks, and lay the foundation for a more intervention-oriented approach to developing and deploying ADS.

data mining, decision support system, machine learning, (22 more...)

arXiv.org Machine Learning

2507.05216

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(19 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.93)
Research Report > New Finding (0.92)
Instructional Material (0.92)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
(10 more...)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Decision Support Systems (1.00)
Information Technology > Data Science > Data Mining (1.00)
(7 more...)

Add feedback

Online Regularized Learning Algorithms in RKHS with $β$- and $ϕ$-Mixing Sequences

Roy, Priyanka, Saminger-Platz, Susanne

arXiv.org Machine LearningJul-9-2025

In this paper, we study an online regularized learning algorithm in a reproducing kernel Hilbert spaces (RKHS) based on a class of dependent processes. We choose such a process where the degree of dependence is measured by mixing coefficients. As a representative example, we analyze a strictly stationary Markov chain, where the dependence structure is characterized by the $ϕ$- and $β$-mixing coefficients. Under these assumptions, we derive probabilistic upper bounds as well as convergence rates for both the exponential and polynomial decay of the mixing coefficients.

artificial intelligence, machine learning, markov chain, (17 more...)

arXiv.org Machine Learning

2507.05929

Country:

Europe > Austria > Upper Austria > Linz (0.04)
North America > United States > New York (0.04)
North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.64)
Instructional Material > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning

Zhang, Ruiqi, Arora, Daman, Mei, Song, Zanette, Andrea

arXiv.org Artificial IntelligenceJul-9-2025

Training large language models with reinforcement learning (RL) against verifiable rewards significantly enhances their reasoning abilities, yet remains computationally expensive due to inefficient uniform prompt sampling. We introduce Selective Prompting with Efficient Estimation of Difficulty (SPEED), an adaptive online RL curriculum that selectively chooses training examples of intermediate difficulty to maximize learning efficiency. Theoretically, we establish that intermediate-difficulty prompts improve the gradient estimator's signal-to-noise ratio, accelerating convergence. Empirically, our efficient implementation leads to 2x to 6x faster training without degrading accuracy, requires no manual tuning, and integrates seamlessly into standard RL algorithms.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.09016

Country: North America > United States (0.93)

Genre:

Research Report (1.00)
Instructional Material > Online (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback