AITopics

Country:

North America > United States > California (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Neural Information Processing SystemsAug-22-2025, 00:36:57 GMT

Semi-Infinitely Constrained Markov Decision Processes

We also devise a reinforcement learning algorithm for SICMDPs that we call SI-CRL.

constraint, reinforcement, si-crl, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Ukraine > Kharkiv Oblast > Kharkiv (0.04)
(2 more...)

Genre: Research Report (0.68)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)

Neural Information Processing SystemsAug-22-2025, 00:28:34 GMT

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model Gen Li Tsinghua Y uting Wei CMU Y uejie Chi CMU Y uantao Gu Tsinghua Y uxin Chen Princeton

Despite a number of prior work tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined.

generative model, sample complexity, sample size, (11 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsAug-22-2025, 00:22:54 GMT

A Unifying Framework of Off-Policy General Value Function Evaluation

We propose a new algorithm called GenTD for off-policy GVFs evaluation and show that GenTD learns multiple interrelated multidimensional GVFs as efficiently as a single canonical scalar value function.

backward gvf, evaluation, gvf, (13 more...)

Country:

North America > Canada > Alberta (0.14)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Illinois > Cook County > Evanston (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining (0.67)

Neural Information Processing SystemsAug-22-2025, 00:22:50 GMT

57ef0373c890b30407eadfe6e06c8c84-Paper-Conference.pdf

backward gvf, evaluation, gvf, (13 more...)

Country:

North America > Canada > Alberta (0.14)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Illinois > Cook County > Evanston (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Nasir, Ahmed, Zenati, Abdelhafid

A Dynamical Systems Framework for Reinforcement Learning Safety and Robustness Verification

The application of reinforcement learning to safety-critical systems is limited by the lack of formal methods for verifying the robustness and safety of learned policies. This paper introduces a novel framework that addresses this gap by analyzing the combination of an RL agent and its environment as a discrete-time autonomous dynamical system. By leveraging tools from dynamical systems theory, specifically the Finite-Time Lyapunov Exponent (FTLE), we identify and visualize Lagrangian Coherent Structures (LCS) that act as the hidden "skeleton" governing the system's behavior. We demonstrate that repelling LCS function as safety barriers around unsafe regions, while attracting LCS reveal the system's convergence properties and potential failure modes, such as unintended "trap" states. To move beyond qualitative visualization, we introduce a suite of quantitative metrics, Mean Boundary Repulsion (MBR), Aggregated Spurious Attractor Strength (ASAS), and Temporally-Aware Spurious Attractor Strength (TASAS), to formally measure a policy's safety margin and robustness. We further provide a method for deriving local stability guarantees and extend the analysis to handle model uncertainty. Through experiments in both discrete and continuous control environments, we show that this framework provides a comprehensive and interpretable assessment of policy behavior, successfully identifying critical flaws in policies that appear successful based on reward alone.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2508.15588

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Universal Reinforcement Learning in Coalgebras: Asynchronous Stochastic Computation via Conduction

Mahadevan, Sridhar

In this paper, we introduce a categorial generalization of RL, termed universal reinforcement learning (URL), building on powerful mathematical abstractions from the study of coinduction on non-well-founded sets and universal coalgebras, topos theory, and categorial models of asynchronous parallel distributed computation. In the first half of the paper, we review the basic RL framework, illustrate the use of categories and functors in RL, showing how they lead to interesting insights. In particular, we also introduce a standard model of asynchronous distributed minimization proposed by Bertsekas and Tsitsiklis, and describe the relationship between metric coinduction and their proof of the Asynchronous Convergence Theorem. The space of algorithms for MDPs or PSRs can be modeled as a functor category, where the co-domain category forms a topos, which admits all (co)limits, possesses a subobject classifier, and has exponential objects. In the second half of the paper, we move on to universal coalgebras. Dynamical system models, such as Markov decision processes (MDPs), partially observed MDPs (POMDPs), a predictive state representation (PSRs), and linear dynamical systems (LDSs) are all special types of coalgebras. We describe a broad family of universal coalgebras, extending the dynamic system models studied previously in RL. The core problem in finding fixed points in RL to determine the exact or approximate (action) value function is generalized in URL to determining the final coalgebra asynchronously in a parallel distributed manner.

category, machine learning, reinforcement learning, (19 more...)

2508.15128

Country:

Europe (0.92)
North America > United States > Massachusetts (0.27)
North America > Canada > British Columbia (0.27)

Genre:

Research Report (1.00)
Overview (0.87)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Chandra, Joydeep, Manhas, Prabal, Kaur, Ramanjot, Sahay, Rashi

Aura-CAPTCHA: A Reinforcement Learning and GAN-Enhanced Multi-Modal CAPTCHA System

Aura-CAPTCHA was developed as a multi-modal CAPTCHA system to address vulnerabilities in traditional methods that are increasingly bypassed by AI technologies, such as Optical Character Recognition (OCR) and adversarial image processing. The design integrated Generative Adversarial Networks (GANs) for generating dynamic image challenges, Reinforcement Learning (RL) for adaptive difficulty tuning, and Large Language Models (LLMs) for creating text and audio prompts. Visual challenges included 3x3 grid selections with at least three correct images, while audio challenges combined randomized numbers and words into a single task. RL adjusted difficulty based on incorrect attempts, response time, and suspicious user behavior. Evaluations on real-world traffic demonstrated a 92% human success rate and a 10% bot bypass rate, significantly outperforming existing CAPTCHA systems. The system provided a robust and scalable approach for securing online applications while remaining accessible to users, addressing gaps highlighted in previous research.

large language model, machine learning, reinforcement learning, (18 more...)

2508.14976

Country: Asia > India (0.47)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Panagiotou, Maria, Brigato, Lorenzo, Streit, Vivien, Hayoz, Amanda, Proennecke, Stephan, Athanasopoulos, Stavros, Olsen, Mikkel T., Brok, Elizabeth J. den, Svensson, Cecilie H., Makrilakis, Konstantinos, Xatzipsalti, Maria, Vazeou, Andriani, Mertens, Peter R., Pedersen-Bjergaard, Ulrik, de Galan, Bastiaan E., Mougiakakou, Stavroula

Personalised Insulin Adjustment with Reinforcement Learning: An In-Silico Validation for People with Diabetes on Intensive Insulin Treatment

Despite recent advances in insulin preparations and technology, adjusting insulin remains an ongoing challenge for the majority of people with type 1 diabetes (T1D) and longstanding type 2 diabetes (T2D). In this study, we propose the Adaptive Basal-Bolus Advisor (ABBA), a personalised insulin treatment recommendation approach based on reinforcement learning for individuals with T1D and T2D, performing self-monitoring blood glucose measurements and multiple daily insulin injection therapy. We developed and evaluated the ability of ABBA to achieve better time-in-range (TIR) for individuals with T1D and T2D, compared to a standard basal-bolus advisor (BBA). The in-silico test was performed using an FDA-accepted population, including 101 simulated adults with T1D and 101 with T2D. An in-silico evaluation shows that ABBA significantly improved TIR and significantly reduced both times below- and above-range, compared to BBA. ABBA's performance continued to improve over two months, whereas BBA exhibited only modest changes. This personalised method for adjusting insulin has the potential to further optimise glycaemic control and support people with T1D and T2D in their daily self-management. Our results warrant ABBA to be trialed for the first time in humans.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

doi: 10.1109/ACCESS.2025.3600738

2505.14477

Country:

Europe > Switzerland (0.46)
Europe > Netherlands (0.28)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)