AITopics | trpo

Collaborating Authors

trpo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy

Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang

Neural Information Processing SystemsFeb-11-2026, 16:53:04 GMT

See also, e.g., [1] fora Bayesianinferenceperspective.

artificial intelligence, arxivpreprintarxiv, machine learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

5cb0e249689cd6d8369c4885435a56c2-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 13:45:18 GMT

experiment, mixedne-ld, reviewer, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

A Appendix

Neural Information Processing SystemsNov-13-2025, 23:42:55 GMT

Confined trust regions are a stable way of making large updates and avoiding pessimistic coefficients.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Evaluation of a Robust Control System in Real-World Cable-Driven Parallel Robots

Nurtdinov, Damir, Korshuk, Aliaksei, Kornaev, Alexei, Maloletov, Alexander

arXiv.org Artificial IntelligenceOct-10-2025

This study evaluates the performance of classical and modern control methods for real-world Cable-Driven Parallel Robots (CDPRs), focusing on underconstrained systems with limited time discretization. A comparative analysis is conducted between classical PID controllers and modern reinforcement learning algorithms, including Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and Trust Region Policy Optimization (TRPO). The results demonstrate that TRPO outperforms other methods, achieving the lowest root mean square (RMS) errors across various trajectories and exhibiting robustness to larger time intervals between control updates. TRPO's ability to balance exploration and exploitation enables stable control in noisy, real-world environments, reducing reliance on high-frequency sensor feedback and computational demands. Cable-Driven Parallel Robots (CDPR) have unique parameters, which means they can move heavy loads within a fairly large space.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2510.0827

Country:

Europe > Russia (0.05)
Asia > Russia (0.05)
North America > United States > California (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report > New Finding (0.90)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

A Appendix

Neural Information Processing SystemsOct-9-2025, 15:08:36 GMT

Confined trust regions are a stable way of making large updates and avoiding pessimistic coefficients.

assumption, trajectory, variant, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

5cb0e249689cd6d8369c4885435a56c2-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 00:37:17 GMT

artificial intelligence, mixedne-ld, reviewer, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

743c41a921516b04afde48bb48e28ce6-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 00:26:25 GMT

HOOF is robust to settings within this range. We could not present results for Ant and Walker due to space constraints. Thus we are restricted to zero order optimisers. For natural gradients like TNPG, HOOF does not add any new hyperparameters beyond those used by grid search - i.e. Other methods like PBT introduce more hyperparameters than these.

artificial intelligence, constraint, hyperparameter, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.32)

Add feedback

Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space

Zhang, Xinyu, Deb, Aishik, Mueller, Klaus

arXiv.org Artificial IntelligenceOct-1-2025

Policy-gradient methods such as Proximal Policy Optimization (PPO) are typically updated along a single stochastic gradient direction, leaving the rich local structure of the parameter space unexplored. Previous work has shown that the surrogate gradient is often poorly correlated with the true reward landscape. Building on this insight, we visualize the parameter space spanned by policy checkpoints within an iteration and reveal that higher performing solutions often lie in nearby unexplored regions. To exploit this opportunity, we introduce ExploRLer, a pluggable pipeline that seamlessly integrates with on-policy algorithms such as PPO and TRPO, systematically probing the unexplored neighborhoods of surrogate on-policy gradient updates. Without increasing the number of gradient updates, ExploRLer achieves significant improvements over baselines in complex continuous control environments. Our results demonstrate that iteration-level exploration provides a practical and effective way to strengthen on-policy reinforcement learning and offer a fresh perspective on the limitations of the surrogate objective.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2509.25876

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.05)
Europe > France (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Add feedback

Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks

Gardner, Jason, Dutta, Ayan, Roy, Swapnoneel, Kreidl, O. Patrick, Boloni, Ladislau

arXiv.org Artificial IntelligenceSep-8-2025

The growing computational demands of deep reinforcement learning (DRL) have raised concerns about the environmental and economic costs of training large-scale models. While algorithmic efficiency in terms of learning performance has been extensively studied, the energy requirements, greenhouse gas emissions, and monetary costs of DRL algorithms remain largely unexplored. In this work, we present a systematic benchmarking study of the energy consumption of seven state-of-the-art DRL algorithms, namely DQN, TRPO, A2C, ARS, PPO, RecurrentPPO, and QR-DQN, implemented using Stable Baselines. Each algorithm was trained for one million steps each on ten Atari 2600 games, and power consumption was measured in real-time to estimate total energy usage, CO2-Equivalent emissions, and electricity cost based on the U.S. national average electricity price. Our results reveal substantial variation in energy efficiency and training cost across algorithms, with some achieving comparable performance while consuming up to 24% less energy (ARS vs. DQN), emitting nearly 68% less CO2, and incurring almost 68% lower monetary cost (QR-DQN vs. RecurrentPPO) than less efficient counterparts. We further analyze the trade-offs between learning performance, training time, energy use, and financial cost, highlighting cases where algorithmic choices can mitigate environmental and economic impact without sacrificing learning performance. This study provides actionable insights for developing energy-aware and cost-efficient DRL practices and establishes a foundation for incorporating sustainability considerations into future algorithmic design and evaluation.

large language model, machine learning, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2509.05273

Country: