AITopics | soft actor-critic

Collaborating Authors

soft actor-critic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Continuous Soft Actor-Critic: An Off-Policy Learning Method Robust to Time Discretization

Neural Information Processing SystemsJun-18-2026, 03:28:04 GMT

Many Deep Reinforcement Learning (DRL) algorithms are sensitive to time discretization, which reduces their performance in real-world scenarios. We propose Continuous Soft Actor-Critic, an off-policy actor-critic DRL algorithm in continuous time and space. It is robust to environment time discretization. We also extend the framework to multi-agent scenarios. This Multi-Agent Reinforcement Learning (MARL) algorithm is suitable for both competitive and cooperative settings.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Information Technology (0.92)
Leisure & Entertainment > Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reduced-Order Model-Guided Reinforcement Learning for Demonstration-Free Humanoid Locomotion

Liu, Shuai, Lau, Meng Cheng

arXiv.org Artificial IntelligenceSep-24-2025

We introduce Reduced-Order Model-Guided Reinforcement Learning (ROM-GRL), a two-stage reinforcement learning framework for humanoid walking that requires no motion capture data or elaborate reward shaping. In the first stage, a compact 4-DOF (four-degree-of-freedom) reduced-order model (ROM) is trained via Proximal Policy Optimization. This generates energy-efficient gait templates. In the second stage, those dynamically consistent trajectories guide a full-body policy trained with Soft Actor--Critic augmented by an adversarial discriminator, ensuring the student's five-dimensional gait feature distribution matches the ROM's demonstrations. Experiments at 1 meter-per-second and 4 meter-per-second show that ROM-GRL produces stable, symmetric gaits with substantially lower tracking error than a pure-reward baseline. By distilling lightweight ROM guidance into high-dimensional policies, ROM-GRL bridges the gap between reward-only and imitation-based locomotion methods, enabling versatile, naturalistic humanoid behaviors without any human demonstrations.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2509.19023

Genre: Research Report (0.65)

Industry:

Leisure & Entertainment (0.46)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Quantum deep reinforcement learning for humanoid robot navigation task

Lokossou, Romerik, Girma, Birhanu Shimelis, Tonguz, Ozan K., Biyabani, Ahmed

arXiv.org Artificial IntelligenceSep-16-2025

Abstract--Classical reinforcement learning (RL) methods often struggle in complex, high-dimensional environments because of their extensive parameter requirements and challenges posed by stochastic, non-deterministic settings. This study introduces quantum deep reinforcement learning (QDRL) to train humanoid agents efficiently. While previous quantum RL models focused on smaller environments, such as wheeled robots and robotic arms, our work pioneers the application of QDRL to humanoid robotics, specifically in environments with substantial observation and action spaces, such as MuJoCo's Humanoid-v4 and Walker2d-v4. Using parameterized quantum circuits, we explored a hybrid quantum-classical setup to directly navigate high-dimensional state spaces, bypassing traditional mapping and planning. By integrating quantum computing with deep RL, we aim to develop models that can efficiently learn complex navigation tasks in humanoid robots. We evaluated the performance of the Soft Actor-Critic (SAC) in classical RL against its quantum implementation. The results show that the quantum SAC achieves an 8% higher average return (246.40)

machine learning, reinforcement, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2509.11388

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Sequence Aware SAC Control for Engine Fuel Consumption Optimization in Electrified Powertrain

Jaleel, Wafeeq, Rownak, Md Ragib, Hanif, Athar, Bhatti, Sidra Ghayour, Ahmed, Qadeer

arXiv.org Artificial IntelligenceAug-8-2025

As hybrid electric vehicles (HEVs) gain traction in heavy-duty trucks, adaptive and efficient energy management is critical on reducing fuel consumption while maintaining battery charge for long operation times. We present a new reinforcement learning (RL) framework based on the Soft Actor-Critic (SAC) algorithm to optimize engine control in series HEVs. We reformulate the control task as a sequential decision-making problem and enhance SAC by incorporating Gated Recurrent Units (GRUs) and Decision Transformers (DTs) into both actor and critic networks to capture temporal dependencies and improve planning over time. To evaluate robustness and generalization, we train the models under diverse initial battery states, drive cycle durations, power demands, and input sequence lengths. Experiments show that the SAC agent with a DT -based actor and GRU-based critic was within 1.8% of Dynamic Programming (DP) in fuel savings on the Highway Fuel Economy Test (HFET) cycle, while the SAC agent with GRUs in both actor and critic networks, and FFN actor-critic agent were within 3.16% and 3.43%, respectively. On unseen drive cycles (US06 and Heavy Heavy-Duty Diesel Truck (HHDDT) cruise segment), generalized sequence-aware agents consistently outperformed feedfor-ward network (FFN)-based agents, highlighting their adaptability and robustness in real-world settings.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2508.04874

Country: North America > United States (0.95)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Energy (1.00)
Automobiles & Trucks (1.00)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic

Vo, Thanh Vinh, Lee, Young, Ma, Haozhe, Lu, Chien, Leong, Tze-Yun

arXiv.org Artificial IntelligenceJun-9-2025

Hidden confounders that influence both states and actions can bias policy learning in reinforcement learning (RL), leading to suboptimal or non-generalizable behavior. Most RL algorithms ignore this issue, learning policies from observational trajectories based solely on statistical associations rather than causal effects. We propose DoSAC (Do-Calculus Soft Actor-Critic with Backdoor Adjustment), a principled extension of the SAC algorithm that corrects for hidden confounding via causal intervention estimation. DoSAC estimates the interventional policy $π(a | \mathrm{do}(s))$ using the backdoor criterion, without requiring access to true confounders or causal labels. To achieve this, we introduce a learnable Backdoor Reconstructor that infers pseudo-past variables (previous state and action) from the current state to enable backdoor adjustment from observational data. This module is integrated into a soft actor-critic framework to compute both the interventional policy and its entropy. Empirical results on continuous control benchmarks show that DoSAC outperforms baselines under confounded settings, with improved robustness, generalization, and policy reliability.

confounder, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2506.05445

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

DriveMind: A Dual-VLM based Reinforcement Learning Framework for Autonomous Driving

Wasif, Dawood, Moore, Terrence J, Reddy, Chandan K, Cho, Jin-Hee

arXiv.org Artificial IntelligenceJun-3-2025

Recent advances in autonomous vehicles have shifted development from rigid pipelines to end-to-end neural policies mapping raw sensor streams directly to control commands [1-3]. While these models offer streamlined architectures and strong benchmark performance, they raise critical deployment concerns. Their internal logic is opaque, complicating validation in safety-critical settings. They struggle to generalize to rare events like severe weather or infrastructure damage and lack formal guarantees on kinematic properties such as speed limits and lane-keeping. Further, they provide no natural interface for human oversight or explanation. These challenges motivate frameworks that combine deep network expressiveness with transparency, robustness, and provable safety. Meanwhile, Large Language Models (LLMs) and Vision Language Models (VLMs) have demonstrated human-level reasoning and visual grounding [4-6]. Recent works like VLM-SR (Shaped Rewards) [7], VLM-RM (Reward Models) [8], and RoboCLIP (Language-Conditioned Robot Learning via Contrastive Language-Image Pretraining) [9] inject semantic feedback into Reinforcement Learning (RL), but rely on static prompts unsuited to evolving road conditions and overlook vehicle dynamics.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2506.00819

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Learning-based Autonomous Oversteer Control and Collision Avoidance

Lee, Seokjun, Kong, Seung-Hyun

arXiv.org Artificial IntelligenceMay-22-2025

Oversteer, wherein a vehicle's rear tires lose traction and induce unintentional excessive yaw, poses critical safety challenges. Failing to control oversteer often leads to severe traffic accidents. Although recent autonomous driving efforts have attempted to handle oversteer through stabilizing maneuvers, the majority rely on expert-defined trajectories or assume obstacle-free environments, limiting real-world applicability. This paper introduces a novel end-to-end (E2E) autonomous driving approach that tackles oversteer control and collision avoidance simultaneously. Existing E2E techniques, including Imitation Learning (IL), Reinforcement Learning (RL), and Hybrid Learning (HL), generally require near-optimal demonstrations or extensive experience. Yet even skilled human drivers struggle to provide perfect demonstrations under oversteer, and high transition variance hinders accumulating sufficient data. Hence, we present Q-Compared Soft Actor-Critic (QC-SAC), a new HL algorithm that effectively learns from suboptimal demonstration data and adapts rapidly to new conditions. To evaluate QC-SAC, we introduce a benchmark inspired by real-world driver training: a vehicle encounters sudden oversteer on a slippery surface and must avoid randomly placed obstacles ahead. Experimental results show QC-SAC attains near-optimal driving policies, significantly surpassing state-of-the-art IL, RL, and HL baselines. Our method demonstrates the world's first safe autonomous oversteer control with obstacle avoidance.

demonstration data, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2505.15275

Country:

North America > United States (0.67)
Asia > South Korea (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (1.00)
Leisure & Entertainment > Sports > Motorsports (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback

Effective Reinforcement Learning Control using Conservative Soft Actor-Critic

Yuan, Xinyi, Shang, Zhiwei, Huang, Wenjun, Cui, Yunduan, Chen, Di, Zhu, Meixin

arXiv.org Artificial IntelligenceMay-7-2025

Reinforcement Learning (RL) has shown great potential in complex control tasks, particularly when combined with deep neural networks within the Actor-Critic (AC) framework. However, in practical applications, balancing exploration, learning stability, and sample efficiency remains a significant challenge. Traditional methods such as Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) address these issues by incorporating entropy or relative entropy regularization, but often face problems of instability and low sample efficiency. In this paper, we propose the Conservative Soft Actor-Critic (CSAC) algorithm, which seamlessly integrates entropy and relative entropy regularization within the AC framework. CSAC improves exploration through entropy regularization while avoiding overly aggressive policy updates with the use of relative entropy regularization. Evaluations on benchmark tasks and real-world robotic simulations demonstrate that CSAC offers significant improvements in stability and efficiency over existing methods. These findings suggest that CSAC provides strong robustness and application potential in control tasks under dynamic environments.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2505.03356

Country: Asia > China > Guangdong Province (0.15)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (0.46)
Information Technology > Robotics & Automation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

Add feedback

CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration

Yang, Chunyu, Bi, Shengben, Xu, Yihui, Zhang, Xin

arXiv.org Artificial IntelligenceMar-18-2025

With the increasing demand for efficient and flexible robotic exploration solutions, Reinforcement Learning (RL) is becoming a promising approach in the field of autonomous robotic exploration. However, current RL-based exploration algorithms often face limited environmental reasoning capabilities, slow convergence rates, and substantial challenges in Sim-To-Real (S2R) transfer. To address these issues, we propose a Curriculum Learning-based Transformer Reinforcement Learning Algorithm (CTSAC) aimed at improving both exploration efficiency and transfer performance. To enhance the robot's reasoning ability, a Transformer is integrated into the perception network of the Soft Actor-Critic (SAC) framework, leveraging historical information to improve the farsightedness of the strategy. A periodic review-based curriculum learning is proposed, which enhances training efficiency while mitigating catastrophic forgetting during curriculum transitions. Training is conducted on the ROS-Gazebo continuous robotic simulation platform, with LiDAR clustering optimization to further reduce the S2R gap. Experimental results demonstrate the CTSAC algorithm outperforms the state-of-the-art non-learning and learning-based algorithms in terms of success rate and success rate-weighted exploration time. Moreover, real-world experiments validate the strong S2R transfer capabilities of CTSAC.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2503.14254

Country:

Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > China > Jiangsu Province > Xuzhou (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns

Tian, Dong, Li, Ge, Zhou, Hongyi, Celik, Onur, Neumann, Gerhard

arXiv.org Artificial IntelligenceMar-6-2025

Unlike traditional methods that focus on evaluating single state-action pairs or apply action chunking in the actor network, this approach feeds chunked actions directly into the critic. Leveraging the Transformer's strength in processing sequential data, the proposed architecture achieves more robust value estimation. Empirical evaluations demonstrate that this method leads to efficient and stable training, particularly excelling in environments with sparse rewards or Multi-Phase tasks.Contribution(s) 1. We present a novel critic architecture for SAC that leverages Transformers to process sequential information, resulting in more accurate value estimations. Context: Transformer-Based Critic Network 2. We introduce a method for incorporating N-Step returns into the critic network in a stable and efficient manner, effectively mitigating the common challenges of variance and importance sampling associated with N-returns. Context: Stable Integration of N-Returns 3. We shift action chunking from the actor to the critic, demonstrating that enhanced temporal reasoning at the critic level--beyond traditional actor-side exploration--drives performance improvements in sparse and multi-phase tasks. Context: Unlike previous approaches that focus on actor-side chunking for exploration, our Transformer-based critic network produces a smooth value surface that is highly responsive to dataset variations, eliminating the need for additional exploration enhancements.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2503.0366

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback