parameterized action
Parameterized Reinforcement Learning for Optical System Optimization
Wankerl, Heribert, Stern, Maike L., Mahdavi, Ali, Eichler, Christoph, Lang, Elmar W.
Designing a multi-layer optical system with designated optical characteristics is an inverse design problem in which the resulting design is determined by several discrete and continuous parameters. In particular, we consider three design parameters to describe a multi-layer stack: Each layer's dielectric material and thickness as well as the total number of layers. Such a combination of both, discrete and continuous parameters is a challenging optimization problem that often requires a computationally expensive search for an optimal system design. Hence, most methods merely determine the optimal thicknesses of the system's layers. To incorporate layer material and the total number of layers as well, we propose a method that considers the stacking of consecutive layers as parameterized actions in a Markov decision process. We propose an exponentially transformed reward signal that eases policy optimization and adapt a recent variant of Q-learning for inverse design optimization. We demonstrate that our method outperforms human experts and a naive reinforcement learning algorithm concerning the achieved optical characteristics. Moreover, the learned Q-values contain information about the optical properties of multi-layer optical systems, thereby allowing physical interpretation or what-if analysis.
Reinforcement Learning with Parameterized Actions
Masson, Warwick (University of the Witwatersrand) | Ranchod, Pravesh (University of the Witwatersrand ) | Konidaris, George (Duke University)
We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions—discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goal-scoring and Platform domains.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Africa > South Africa > Gauteng > Johannesburg (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
Reinforcement Learning with Parameterized Actions
Masson, Warwick, Ranchod, Pravesh, Konidaris, George
We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions--discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goalscoring and Platform domains.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Africa > South Africa > Gauteng > Johannesburg (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)