Goto

Collaborating Authors

 trpo






Evaluation of a Robust Control System in Real-World Cable-Driven Parallel Robots

Nurtdinov, Damir, Korshuk, Aliaksei, Kornaev, Alexei, Maloletov, Alexander

arXiv.org Artificial Intelligence

This study evaluates the performance of classical and modern control methods for real-world Cable-Driven Parallel Robots (CDPRs), focusing on underconstrained systems with limited time discretization. A comparative analysis is conducted between classical PID controllers and modern reinforcement learning algorithms, including Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and Trust Region Policy Optimization (TRPO). The results demonstrate that TRPO outperforms other methods, achieving the lowest root mean square (RMS) errors across various trajectories and exhibiting robustness to larger time intervals between control updates. TRPO's ability to balance exploration and exploitation enables stable control in noisy, real-world environments, reducing reliance on high-frequency sensor feedback and computational demands. Cable-Driven Parallel Robots (CDPR) have unique parameters, which means they can move heavy loads within a fairly large space.




743c41a921516b04afde48bb48e28ce6-AuthorFeedback.pdf

Neural Information Processing Systems

HOOF is robust to settings within this range. We could not present results for Ant and Walker due to space constraints. Thus we are restricted to zero order optimisers. For natural gradients like TNPG, HOOF does not add any new hyperparameters beyond those used by grid search - i.e. Other methods like PBT introduce more hyperparameters than these.


Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space

Zhang, Xinyu, Deb, Aishik, Mueller, Klaus

arXiv.org Artificial Intelligence

Policy-gradient methods such as Proximal Policy Optimization (PPO) are typically updated along a single stochastic gradient direction, leaving the rich local structure of the parameter space unexplored. Previous work has shown that the surrogate gradient is often poorly correlated with the true reward landscape. Building on this insight, we visualize the parameter space spanned by policy checkpoints within an iteration and reveal that higher performing solutions often lie in nearby unexplored regions. To exploit this opportunity, we introduce ExploRLer, a pluggable pipeline that seamlessly integrates with on-policy algorithms such as PPO and TRPO, systematically probing the unexplored neighborhoods of surrogate on-policy gradient updates. Without increasing the number of gradient updates, ExploRLer achieves significant improvements over baselines in complex continuous control environments. Our results demonstrate that iteration-level exploration provides a practical and effective way to strengthen on-policy reinforcement learning and offer a fresh perspective on the limitations of the surrogate objective.


Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks

Gardner, Jason, Dutta, Ayan, Roy, Swapnoneel, Kreidl, O. Patrick, Boloni, Ladislau

arXiv.org Artificial Intelligence

The growing computational demands of deep reinforcement learning (DRL) have raised concerns about the environmental and economic costs of training large-scale models. While algorithmic efficiency in terms of learning performance has been extensively studied, the energy requirements, greenhouse gas emissions, and monetary costs of DRL algorithms remain largely unexplored. In this work, we present a systematic benchmarking study of the energy consumption of seven state-of-the-art DRL algorithms, namely DQN, TRPO, A2C, ARS, PPO, RecurrentPPO, and QR-DQN, implemented using Stable Baselines. Each algorithm was trained for one million steps each on ten Atari 2600 games, and power consumption was measured in real-time to estimate total energy usage, CO2-Equivalent emissions, and electricity cost based on the U.S. national average electricity price. Our results reveal substantial variation in energy efficiency and training cost across algorithms, with some achieving comparable performance while consuming up to 24% less energy (ARS vs. DQN), emitting nearly 68% less CO2, and incurring almost 68% lower monetary cost (QR-DQN vs. RecurrentPPO) than less efficient counterparts. We further analyze the trade-offs between learning performance, training time, energy use, and financial cost, highlighting cases where algorithmic choices can mitigate environmental and economic impact without sacrificing learning performance. This study provides actionable insights for developing energy-aware and cost-efficient DRL practices and establishes a foundation for incorporating sustainability considerations into future algorithmic design and evaluation.