AITopics | sac algorithm

Collaborating Authors

sac algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quantum deep reinforcement learning for humanoid robot navigation task

Lokossou, Romerik, Girma, Birhanu Shimelis, Tonguz, Ozan K., Biyabani, Ahmed

arXiv.org Artificial IntelligenceSep-16-2025

Abstract--Classical reinforcement learning (RL) methods often struggle in complex, high-dimensional environments because of their extensive parameter requirements and challenges posed by stochastic, non-deterministic settings. This study introduces quantum deep reinforcement learning (QDRL) to train humanoid agents efficiently. While previous quantum RL models focused on smaller environments, such as wheeled robots and robotic arms, our work pioneers the application of QDRL to humanoid robotics, specifically in environments with substantial observation and action spaces, such as MuJoCo's Humanoid-v4 and Walker2d-v4. Using parameterized quantum circuits, we explored a hybrid quantum-classical setup to directly navigate high-dimensional state spaces, bypassing traditional mapping and planning. By integrating quantum computing with deep RL, we aim to develop models that can efficiently learn complex navigation tasks in humanoid robots. We evaluated the performance of the Soft Actor-Critic (SAC) in classical RL against its quantum implementation. The results show that the quantum SAC achieves an 8% higher average return (246.40)

machine learning, reinforcement, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2509.11388

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Imitation Learning for Satellite Attitude Control under Unknown Perturbations

Zhang, Zhizhuo, Peng, Hao, Bai, Xiaoli

arXiv.org Artificial IntelligenceJul-3-2025

This paper presents a novel satellite attitude control framework that integrates Soft Actor-Critic (SAC) reinforcement learning with Generative Adversarial Imitation Learning (GAIL) to achieve robust performance under various unknown perturbations. Traditional control techniques often rely on precise system models and are sensitive to parameter uncertainties and external perturbations. To overcome these limitations, we first develop a SAC-based expert controller that demonstrates improved resilience against actuator failures, sensor noise, and attitude misalignments, outperforming our previous results in several challenging scenarios. We then use GAIL to train a learner policy that imitates the expert's trajectories, thereby reducing training costs and improving generalization through expert demonstrations. Preliminary experiments under single and combined perturbations show that the SAC expert can rotate the antenna to a specified direction and keep the antenna orientation reliably stable in most of the listed perturbations. Additionally, the GAIL learner can imitate most of the features from the trajectories generated by the SAC expert. Comparative evaluations and ablation studies confirm the effectiveness of the SAC algorithm and reward shaping. The integration of GAIL further reduces sample complexity and demonstrates promising imitation capabilities, paving the way for more intelligent and autonomous spacecraft control systems. INTRODUCTION Aiming at accurately orienting and stabilizing satellites towards specific directions or targets in space, satellite attitude control is a critical aspect of spacecraft missions. Particularly in environments with perturbations (such as orbital perturbations, atmospheric drag, or solar radiation pressure), traditional control methods often require additional compensation strategies.

experiment, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2507.01161

Country:

North America > United States > Rocky Mountains (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Florida > Volusia County > Daytona Beach (0.04)
North America > Canada > Rocky Mountains (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Energy > Renewable > Solar (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Bidirectional Soft Actor-Critic: Leveraging Forward and Reverse KL Divergence for Efficient Reinforcement Learning

Zhang, Yixian, Tang, Huaze, Wei, Changxu, Ding, Wenbo

arXiv.org Artificial IntelligenceJun-3-2025

The Soft Actor-Critic (SAC) algorithm, a state-of-the-art method in maximum entropy reinforcement learning, traditionally relies on minimizing reverse Kullback-Leibler (KL) divergence for policy updates. However, this approach leads to an intractable optimal projection policy, necessitating gradient-based approximations that can suffer from instability and poor sample efficiency. This paper investigates the alternative use of forward KL divergence within SAC. We demonstrate that for Gaussian policies, forward KL divergence yields an explicit optimal projection policy -- corresponding to the mean and variance of the target Boltzmann distribution's action marginals. Building on the distinct advantages of both KL directions, we propose Bidirectional SAC, an algorithm that first initializes the policy using the explicit forward KL projection and then refines it by optimizing the reverse KL divergence. Comprehensive experiments on continuous control benchmarks show that Bidirectional SAC significantly outperforms standard SAC and other baselines, achieving up to a $30\%$ increase in episodic rewards, alongside enhanced sample efficiency.

kl divergence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2506.01639

Genre: Research Report (1.00)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

Add feedback

Real-time Monitoring and Analysis of Track and Field Athletes Based on Edge Computing and Deep Reinforcement Learning Algorithm

Tang, Xiaowei, Long, Bin, Zhou, Li

arXiv.org Artificial IntelligenceNov-11-2024

As a fundamental sports discipline, track and field not In recent years, real-time monitoring and data analysis only forms the core of major events like the Olympics have become increasingly critical in enhancing athletic and World Championships but also plays a crucial role in performance. Studies have shown that by monitoring physiological promoting public health Jacobsson, Ekberg, Timpka, Haggren indicators (such as heart rate, body temperature, and Råsberg, Sjöberg, Mirkovic and Nilsson (2020); Timpka, blood oxygen saturation) and performance metrics (such as Dahlström, Fagher, Adami, Andersson, Jacobsson, Svedin speed, acceleration, and force) in real-time, it is possible to and Bermon (2022). The wide variety of track and field events, identify problems during training promptly and make targeted including sprints, middle and long-distance running, jumps, adjustments. For example, analyzing heart rate changes under and throws, demand high levels of physical fitness, technical different training intensities can assess endurance levels and skills, and mental strength from athletes Guo (2022); Zhang recovery status, while monitoring gait and acceleration during et al. (2023a). To excel in such competitive environments, running can optimize technical movements and improve athletes require not only innate talent and dedication but efficiency Rana and Mittal (2020a). Many studies have begun also scientific and systematic training methods Zhang et al. exploring the potential of using sensor technology and data (2023b); Yuan et al. (2024).

accuracy, algorithm, athlete, (15 more...)

arXiv.org Artificial Intelligence

2411.0672

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Leisure & Entertainment > Sports > Running (0.68)
Leisure & Entertainment > Sports > Track & Field (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

DRL-Based Federated Self-Supervised Learning for Task Offloading and Resource Allocation in ISAC-Enabled Vehicle Edge Computing

Gu, Xueying, Wu, Qiong, Fan, Pingyi, Cheng, Nan, Chen, Wen, Letaief, Khaled B.

arXiv.org Artificial IntelligenceAug-27-2024

Intelligent Transportation Systems (ITS) leverage Integrated Sensing and Communications (ISAC) to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles (IoV). This integration inevitably increases computing demands, risking real-time system stability. Vehicle Edge Computing (VEC) addresses this by offloading tasks to Road Side Unit (RSU), ensuring timely services. Our previous work FLSimCo algorithm, which uses local resources for Federated Self-Supervised Learning (SSL), though vehicles often can't complete all iterations task. Our improved algorithm offloads partial task to RSU and optimizes energy consumption by adjusting transmission power, CPU frequency, and task assignment ratios, balancing local and RSU-based training. Meanwhile, setting an offloading threshold further prevents inefficiencies. Simulation results show that the enhanced algorithm reduces energy consumption, improves offloading efficiency and the accuracy of Federated SSL.

algorithm, energy consumption, vehicle, (11 more...)

arXiv.org Artificial Intelligence

2408.14831

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Beijing > Beijing (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Energy (0.71)
Information Technology (0.46)
Transportation > Infrastructure & Services (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
(3 more...)

Add feedback

DRL-Based Resource Allocation for Motion Blur Resistant Federated Self-Supervised Learning in IoV

Gu, Xueying, Wu, Qiong, Fan, Pingyi, Fan, Qiang, Cheng, Nan, Chen, Wen, Letaief, Khaled B.

arXiv.org Artificial IntelligenceAug-17-2024

In the Internet of Vehicles (IoV), Federated Learning (FL) provides a privacy-preserving solution by aggregating local models without sharing data. Traditional supervised learning requires image data with labels, but data labeling involves significant manual effort. Federated Self-Supervised Learning (FSSL) utilizes Self-Supervised Learning (SSL) for local training in FL, eliminating the need for labels while protecting privacy. Compared to other SSL methods, Momentum Contrast (MoCo) reduces the demand for computing resources and storage space by creating a dictionary. However, using MoCo in FSSL requires uploading the local dictionary from vehicles to Base Station (BS), which poses a risk of privacy leakage. Simplified Contrast (SimCo) addresses the privacy leakage issue in MoCo-based FSSL by using dual temperature instead of a dictionary to control sample distribution. Additionally, considering the negative impact of motion blur on model aggregation, and based on SimCo, we propose a motion blur-resistant FSSL method, referred to as BFSSL. Furthermore, we address energy consumption and delay in the BFSSL process by proposing a Deep Reinforcement Learning (DRL)-based resource allocation scheme, called DRL-BFSSL. In this scheme, BS allocates the Central Processing Unit (CPU) frequency and transmission power of vehicles to minimize energy consumption and latency, while aggregating received models based on the motion blur level. Simulation results validate the effectiveness of our proposed aggregation and resource allocation methods.

algorithm, learning, vehicle, (16 more...)

arXiv.org Artificial Intelligence

2408.09194

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)
Oceania > Australia (0.04)
(7 more...)

Genre: Research Report (0.64)

Industry:

Information Technology (1.00)
Telecommunications (0.87)
Energy (0.69)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback

Sustainable Diffusion-based Incentive Mechanism for Generative AI-driven Digital Twins in Industrial Cyber-Physical Systems

Wen, Jinbo, Kang, Jiawen, Niyato, Dusit, Zhang, Yang, Mao, Shiwen

arXiv.org Artificial IntelligenceAug-2-2024

Industrial Cyber-Physical Systems (ICPSs) are an integral component of modern manufacturing and industries. By digitizing data throughout the product life cycle, Digital Twins (DTs) in ICPSs enable a shift from current industrial infrastructures to intelligent and adaptive infrastructures. Thanks to data process capability, Generative Artificial Intelligence (GAI) can drive the construction and update of DTs to improve predictive accuracy and prepare for diverse smart manufacturing. However, mechanisms that leverage sensing Industrial Internet of Things (IIoT) devices to share data for the construction of DTs are susceptible to adverse selection problems. In this paper, we first develop a GAI-driven DT architecture for ICPSs. To address the adverse selection problem caused by information asymmetry, we propose a contract theory model and develop the sustainable diffusion-based soft actor-critic algorithm to identify the optimal feasible contract. Specifically, we leverage the dynamic structured pruning technique to reduce parameter numbers of actor networks, allowing sustainability and efficient implementation of the proposed algorithm. Finally, numerical results demonstrate the effectiveness of the proposed scheme.

algorithm, dt construction, iiot device, (13 more...)

arXiv.org Artificial Intelligence

2408.01173

Country:

Asia > Singapore (0.05)
Asia > China > Jiangsu Province > Nanjing (0.05)
North America > United States > New York > Kings County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology (1.00)
Energy (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Add feedback

Deep-Reinforcement-Learning-Based AoI-Aware Resource Allocation for RIS-Aided IoV Networks

Qi, Kangwei, Wu, Qiong, Fan, Pingyi, Cheng, Nan, Chen, Wen, Wang, Jiangzhou, Letaief, Khaled B.

arXiv.org Artificial IntelligenceJun-17-2024

Reconfigurable Intelligent Surface (RIS) is a pivotal technology in communication, offering an alternative path that significantly enhances the link quality in wireless communication environments. In this paper, we propose a RIS-assisted internet of vehicles (IoV) network, considering the vehicle-to-everything (V2X) communication method. In addition, in order to improve the timeliness of vehicle-to-infrastructure (V2I) links and the stability of vehicle-to-vehicle (V2V) links, we introduce the age of information (AoI) model and the payload transmission probability model. Therefore, with the objective of minimizing the AoI of V2I links and prioritizing transmission of V2V links payload, we construct this optimization problem as an Markov decision process (MDP) problem in which the BS serves as an agent to allocate resources and control phase-shift for the vehicles using the soft actor-critic (SAC) algorithm, which gradually converges and maintains a high stability. A AoI-aware joint vehicular resource allocation and RIS phase-shift control scheme based on SAC algorithm is proposed and simulation results show that its convergence speed, cumulative reward, AoI performance, and payload transmission probability outperforms those of proximal policy optimization (PPO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3) and stochastic algorithms.

algorithm, communication, vehicle, (8 more...)

arXiv.org Artificial Intelligence

2406.11245

Country:

North America (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)
(6 more...)

Genre: Research Report > New Finding (0.66)

Industry: Telecommunications (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning

Lopez, Antonio, Fridovich-Keil, David

arXiv.org Artificial IntelligenceMar-18-2024

Recent methods using Reinforcement Learning (RL) have proven to be successful for training intelligent agents in unknown environments. However, RL has not been applied widely in real-world robotics scenarios. This is because current state-of-the-art RL methods require large amounts of data to learn a specific task, leading to unreasonable costs when deploying the agent to collect data in real-world applications. In this paper, we build from existing work that reshapes the reward function in RL by introducing a Control Lyapunov Function (CLF), which is demonstrated to reduce the sample complexity. Still, this formulation requires knowing a CLF of the system, but due to the lack of a general method, it is often a challenge to identify a suitable CLF. Existing work can compute low-dimensional CLFs via a Hamilton-Jacobi reachability procedure. However, this class of methods becomes intractable on high-dimensional systems, a problem that we address by using a system decomposition technique to compute what we call Decomposed Control Lyapunov Functions (DCLFs). We use the computed DCLF for reward shaping, which we show improves RL performance. Through multiple examples, we demonstrate the effectiveness of this approach, where our method finds a policy to successfully land a quadcopter in less than half the amount of real-world data required by the state-of-the-art Soft-Actor Critic algorithm.

algorithm, reinforcement learning, subsystem, (10 more...)

arXiv.org Artificial Intelligence

2403.1221

Country:

North America > United States > Texas (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Government (0.68)
Transportation > Air (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback

Deep Reinforcement Learning for Community Battery Scheduling under Uncertainties of Load, PV Generation, and Energy Prices

Fan, Jiarong, Wang, Hao

arXiv.org Artificial IntelligenceDec-4-2023

In response to the growing uptake of distributed energy resources (DERs), community batteries have emerged as a promising solution to support renewable energy integration, reduce peak load, and enhance grid reliability. This paper presents a deep reinforcement learning (RL) strategy, centered around the soft actor-critic (SAC) algorithm, to schedule a community battery system in the presence of uncertainties, such as solar photovoltaic (PV) generation, local demand, and real-time energy prices. We position the community battery to play a versatile role, in integrating local PV energy, reducing peak load, and exploiting energy price fluctuations for arbitrage, thereby minimizing the system cost. To improve exploration and convergence during RL training, we utilize the noisy network technique. This paper conducts a comparative study of different RL algorithms, including proximal policy optimization (PPO) and deep deterministic policy gradient (DDPG) algorithms, to evaluate their effectiveness in the community battery scheduling problem. The results demonstrate the potential of RL in addressing community battery scheduling challenges and show that the SAC algorithm achieves the best performance compared to RL and optimization benchmarks.

algorithm, battery, community battery, (15 more...)

arXiv.org Artificial Intelligence

2312.03008

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > Western Australia (0.04)
Europe (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Power Industry (1.00)
Energy > Energy Storage (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback