Appendix Reinforcement Learning Baselines

Apr-25-2026, 05:28:40 GMT–Neural Information Processing Systems

DrQ: This model-free, off-policy reinforcement learning algorithm, is based on Soft Actor-Critic (SAC) [19]. DrQ enhances training stability via applying data augmentation to regularize the Q value of state-action pairs. The key of DrQ is to promote similarity between augmented state-action pairs. The Q-regularization technique is shown in Eq 1, where K is the number of samples, T is the collection of augmentation. Q(f (s,νk),ak) where νk T and ak π( | f (s,νk)) (1) DrQ-v2: An improved version of DrQ. DrQ-v2 fuses essential elements from the DDPG algorithm with data augmentation to strengthen visual RL agents' performance. DrQ-v2 also incorporates techniques such as n-step return and target critic, leading to commendable results in most of the medium and hard level DM-Control tasks. The auxiliary contrastive loss (Eq 3) allows the agent to obtain better image representation during training, thus mitigating the optimization difficulty under high-dimensional inputs.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Apr-25-2026, 05:28:40 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Appendix A Visual Reinforcement Learning Baselines DrQ: This model-free, off-policy reinforcement learning algorithm, is based on Soft Actor-Critic (SAC) [

Similar Docs Excel Report more

Title	Similarity	Source
None found