Reinforcement Learning on Multiple Correlated Signals
Brys, Tim (Vrije Universiteit Brussel) | Nowé, Ann (Vrije Universiteit Brussel)
As potential-based reward shaping functions (heuristic signals conflicts may exist between objectives, there is in general guiding exploration) (Brys et al. 2014a). We prove that this a need to identify (a set of) tradeoff solutions. The set modification preserves the total order, and thus also optimality, of optimal, i.e. non-dominated, incomparable solutions is of policies, mainly relying on the results by Ng, Harada, called the Pareto-front. We identify multi-objective problems and Russell (1999). This insight - that any MDP can be with correlated objectives (CMOP) as a specific subclass framed as a CMOMDP - significantly increases the importance of multi-objective problems, defined to contain those of this problem class, as well as techniques developed MOPs whose Pareto-front is so limited that one can barely for it, as these could be used to solve regular single-objective speak of tradeoffs (Brys et al. 2014b). By consequence, MDPs faster and better, provided several meaningful shapings the system designer does not care about which of the very can be devised.
Jul-14-2014
- Technology: