A General Approach of Automated Environment Design for Learning the Optimal Power Flow

May-14-2025–arXiv.org Artificial Intelligence

Reinforcement learning (RL) algorithms are increasingly used to solve the optimal power flow (OPF) problem. Yet, the question of how to design RL environments to maximize training performance remains unanswered, both for the OPF and the general case. We propose a general approach for automated RL environment design by utilizing multi-objective optimization. For that, we use the hyperparameter optimization (HPO) framework, which allows the reuse of existing HPO algorithms and methods. On five OPF benchmark problems, we demonstrate that our automated design approach consistently outperforms a manually created baseline environment design. Further, we use statistical analyses to determine which environment design decisions are especially important for performance, resulting in multiple novel insights on how RL-OPF environments should be designed. Finally, we discuss the risk of overfitting the environment to the utilized RL algorithm. To the best of our knowledge, this is the first general approach for automated RL environment design.

environment design, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

May-14-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Chongqing Province > Chongqing (0.04)
- Europe
  - Germany > Lower Saxony
    - Oldenburg (0.04)
  - Netherlands > South Holland
    - Rotterdam (0.05)
- North America > United States
  - California
    - Orange County > Anaheim (0.04)
    - San Francisco County > San Francisco (0.14)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - New York > New York County
    - New York City (0.04)

Genre:
- Research Report
  - Experimental Study (0.68)
  - New Finding (0.46)

Industry:
- Energy > Power Industry (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Optimization (1.00)