Reinforcement Learning on Multiple Correlated Signals

Brys, Tim (Vrije Universiteit Brussel) | Nowé, Ann (Vrije Universiteit Brussel)

AAAI Conferences 

As potential-based reward shaping functions (heuristic signals conflicts may exist between objectives, there is in general guiding exploration) (Brys et al. 2014a). We prove that this a need to identify (a set of) tradeoff solutions. The set modification preserves the total order, and thus also optimality, of optimal, i.e. non-dominated, incomparable solutions is of policies, mainly relying on the results by Ng, Harada, called the Pareto-front. We identify multi-objective problems and Russell (1999). This insight - that any MDP can be with correlated objectives (CMOP) as a specific subclass framed as a CMOMDP - significantly increases the importance of multi-objective problems, defined to contain those of this problem class, as well as techniques developed MOPs whose Pareto-front is so limited that one can barely for it, as these could be used to solve regular single-objective speak of tradeoffs (Brys et al. 2014b). By consequence, MDPs faster and better, provided several meaningful shapings the system designer does not care about which of the very can be devised.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found