Learning optimal environments using projected stochastic gradient ascent

Bolland, Adrien, Boukas, Ioannis, Cornet, François, Berger, Mathias, Ernst, Damien

Jun-2-2020–arXiv.org Machine Learning

In this work, we generalize the direct policy search algorithms to an algorithm we call Direct Environment Search with (projected stochastic) Gradient Ascent (DESGA). The latter can be used to jointly learn a reinforcement learning (RL) environment and a policy with maximal expected return over a joint hypothesis space of environments and policies. We illustrate the performance of DESGA on two benchmarks. First, we consider a parametrized space of Mass-Spring-Damper (MSD) environments. Then, we use our algorithm for optimizing the size of the components and the operation of a small-scale and autonomous energy system, i.e. a solar off-grid microgrid, composed of photovoltaic panels, batteries, etc.

algorithm, optimization problem, renewable energy, (19 more...)

arXiv.org Machine Learning

Jun-2-2020

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Industry:
- Energy
  - Power Industry (1.00)
  - Renewable > Solar (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.71)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found