HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal Pipelined Architecture for Generalized Advantage Estimation

Jan-22-2025–arXiv.org Artificial Intelligence

This paper introduces HEPPO, an FPGA-based accelerator designed to optimize the Generalized Advantage Estimation (GAE) stage in Proximal Policy Optimization (PPO). Unlike previous approaches that focused on trajectory collection and actor-critic updates, HEPPO addresses GAE's computational demands with a parallel, pipelined architecture implemented on a single System-on-Chip (SoC). This design allows for the adaptation of various hardware accelerators tailored for different PPO phases. A key innovation is our strategic standardization technique, which combines dynamic reward standardization and block standardization for values, followed by 8-bit uniform quantization. This method stabilizes learning, enhances performance, and manages memory bottlenecks, achieving a 4x reduction in memory usage and a 1.5x increase in cumulative rewards. We propose a solution on a single SoC device with programmable logic and embedded processors, delivering throughput orders of magnitude higher than traditional CPU-GPU systems. Our single-chip solution minimizes communication latency and throughput bottlenecks, significantly boosting PPO training efficiency. Experimental results show a 30% increase in PPO speed and a substantial reduction in memory access time, underscoring HEPPO's potential for broad applicability in hardware-efficient reinforcement learning algorithms.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

Jan-22-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland
  - Zürich > Zürich (0.14)
- North America
  - Canada > Ontario
    - Hamilton (0.14)
  - United States (0.47)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Energy (0.46)
- Leisure & Entertainment > Games (0.46)
- Semiconductors & Electronics (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.94)
  - Reinforcement Learning (1.00)