AITopics | actor critic

Real-Time Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 09:51:19 GMT

Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world safety critical situations, this mismatch between the assumptions underlying classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real-time. We then use those insights to create a new algorithm Real-Time Actor Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor Critic both in real-time and non-real-time settings.

name change, real-time reinforcement learning, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Real-Time Reinforcement Learning

Neural Information Processing SystemsFeb-11-2025, 21:50:51 GMT

Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world safety critical situations, this mismatch between the assumptions underlying classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real-time. We then use those insights to create a new algorithm Real-Time Actor Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor Critic both in real-time and non-real-time settings.

actor critic, real-time reinforcement learning, reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Actor Critic with Experience Replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy

Abrar, Md Mainul, Sapkota, Parvat, Sprouts, Damon, Jia, Xun, Chi, Yujie

arXiv.org Artificial IntelligenceFeb-1-2025

Background: Real-time treatment planning in IMRT is challenging due to complex beam interactions. AI has improved automation, but existing models require large, high-quality datasets and lack universal applicability. Deep reinforcement learning (DRL) offers a promising alternative by mimicking human trial-and-error planning. Purpose: Develop a stochastic policy-based DRL agent for automatic treatment planning with efficient training, broad applicability, and robustness against adversarial attacks using Fast Gradient Sign Method (FGSM). Methods: Using the Actor-Critic with Experience Replay (ACER) architecture, the agent tunes treatment planning parameters (TPPs) in inverse planning. Training is based on prostate cancer IMRT cases, using dose-volume histograms (DVHs) as input. The model is trained on a single patient case, validated on two independent cases, and tested on 300+ plans across three datasets. Plan quality is assessed using ProKnow scores, and robustness is tested against adversarial attacks. Results: Despite training on a single case, the model generalizes well. Before ACER-based planning, the mean plan score was 6.20$\pm$1.84; after, 93.09% of cases achieved a perfect score of 9, with a mean of 8.93$\pm$0.27. The agent effectively prioritizes optimal TPP tuning and remains robust against adversarial attacks. Conclusions: The ACER-based DRL agent enables efficient, high-quality treatment planning in prostate cancer IMRT, demonstrating strong generalizability and robustness.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2502.00346

Country:

North America > United States > Texas > Tarrant County > Arlington (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Florida > Seminole County > Sanford (0.04)
(2 more...)

Genre:

Workflow (0.93)
Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology > Prostate Cancer (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Small Gain Analysis of Single Timescale Actor Critic

Olshevsky, Alex, Gharesifard, Bahman

arXiv.org Artificial IntelligenceMay-25-2023

We consider a version of actor-critic which uses proportional step-sizes and only one critic update with a single sample from the stationary distribution per actor step. We provide an analysis of this method using the small-gain theorem. Specifically, we prove that this method can be used to find a stationary point, and that the resulting sample complexity improves the state of the art for actor-critic methods to $O \left(\mu^{-2} \epsilon^{-2} \right)$ to find an $\epsilon$-approximate stationary point where $\mu$ is the condition number associated with the critic.

lemma 5, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2203.02591

Country: North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Climate Change Policy Exploration using Reinforcement Learning

Wolf, Theodore

arXiv.org Artificial IntelligenceOct-23-2022

Climate Change is an incredibly complicated problem that humanity faces. When many variables interact with each other, it can be difficult for humans to grasp the causes and effects of the very large-scale problem of climate change. The climate is a dynamical system, where small changes can have considerable and unpredictable repercussions in the long term. Understanding how to nudge this system in the right ways could help us find creative solutions to climate change. In this research, we combine Deep Reinforcement Learning and a World-Earth system model to find, and explain, creative strategies to a sustainable future. This is an extension of the work from Strnad et al. where we extend on the method and analysis, by taking multiple directions. We use four different Reinforcement Learning agents varying in complexity to probe the environment in different ways and to find various strategies. The environment is a low-complexity World Earth system model where the goal is to reach a future where all the energy for the economy is produced by renewables by enacting different policies. We use a reward function based on planetary boundaries that we modify to force the agents to find a wider range of strategies. To favour applicability, we slightly modify the environment, by injecting noise and making it fully observable, to understand the impacts of these factors on the learning of the agents.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2211.17013

Country:

North America > United States (0.46)
Europe > Germany > Brandenburg > Potsdam (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.92)

Industry:

Leisure & Entertainment (1.00)
Energy > Renewable (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Reinforcement Learning 2.0

#artificialintelligenceDec-29-2021, 00:46:30 GMT

Welcome to Deep Reinforcement Learning 2.0! In this course, we will learn and implement a new incredibly smart AI model, called the Twin-Delayed DDPG, which combines state of the art techniques in Artificial Intelligence including continuous Double Deep Q-Learning, Policy Gradient, and Actor Critic. The model is so strong that for the first time in our courses, we are able to solve the most challenging virtual AI applications (training an ant/spider and a half humanoid to walk and run across a field). In this part we will study all the fundamentals of Artificial Intelligence which will allow you to understand and master the AI of this course. These include Q-Learning, Deep Q-Learning, Policy Gradient, Actor-Critic and more.

deep q-learning, deep reinforcement learning 2, policy gradient, (10 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.53)

Industry: Education > Educational Setting > Online (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Risk Conditioned Neural Motion Planning

Huang, Xin, Feng, Meng, Jasour, Ashkan, Rosman, Guy, Williams, Brian

arXiv.org Artificial IntelligenceAug-4-2021

Risk-bounded motion planning is an important yet difficult problem for safety-critical tasks. While existing mathematical programming methods offer theoretical guarantees in the context of constrained Markov decision processes, they either lack scalability in solving larger problems or produce conservative plans. Recent advances in deep reinforcement learning improve scalability by learning policy networks as function approximators. In this paper, we propose an extension of soft actor critic model to estimate the execution risk of a plan through a risk critic and produce risk-bounded policies efficiently by adding an extra risk term in the loss function of the policy network. We define the execution risk in an accurate form, as opposed to approximating it through a summation of immediate risks at each time step that leads to conservative plans. Our proposed model is conditioned on a continuous spectrum of risk bounds, allowing the user to adjust the risk-averse level of the agent on the fly. Through a set of experiments, we show the advantage of our model in terms of both computational time and plan quality, compared to a state-of-the-art mathematical programming baseline, and validate its performance in more complicated scenarios, including nonlinear dynamics and larger state space.

constraint, execution risk, probability, (16 more...)

arXiv.org Artificial Intelligence

2108.01851

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.88)
(2 more...)

Add feedback

Real-Time Reinforcement Learning

Ramstedt, Simon, Pal, Chris

Neural Information Processing SystemsMar-18-2020, 21:33:41 GMT

Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world safety critical situations, this mismatch between the assumptions underlying classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real-time. We then use those insights to create a new algorithm Real-Time Actor Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor Critic both in real-time and non-real-time settings.

actor critic, real-time reinforcement learning, reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Deep Reinforcement Learning 2.0

#artificialintelligenceFeb-1-2020, 08:11:20 GMT

Welcome to Deep Reinforcement Learning 2.0! In this course, we will learn and implement a new incredibly smart AI model, called the Twin-Delayed DDPG, which combines state of the art techniques in Artificial Intelligence including continuous Double Deep Q-Learning, Policy Gradient, and Actor Critic. The model is so strong that for the first time in our courses, we are able to solve the most challenging virtual AI applications (training an ant/spider and a half humanoid to walk and run across a field). In this part we will study all the fundamentals of Artificial Intelligence which will allow you to understand and master the AI of this course. These include Q-Learning, Deep Q-Learning, Policy Gradient, Actor-Critic and more.

deep q-learning, deep reinforcement learning 2, policy gradient, (10 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.53)

Industry: Education > Educational Setting > Online (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning is full of Manipulative Consultants

#artificialintelligenceJan-7-2020, 10:55:05 GMT

Imagine you go to an investment consultant, and you first ask how he charges. Is it according to the profit you'll make? "The more accurate I am in my predictions of your returns, you'll pay me more. But I will be tested only on the investments you choose to make." This smells a bit fishy, and you start sniffing around for other people who are using this consultant. Turns out he recommended them all only government bonds with low return and low variability.

manipulative consultant, reinforcement learning, variance, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback