behaviour space
Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback
Kompatscher, Jan, Shi, Danqing, Varni, Giovanna, Weinkauf, Tino, Oulasvirta, Antti
Reinforcement learning from human feedback (RLHF) has emerged as a key enabling technology for aligning AI behaviour with human preferences. The traditional way to collect data in RLHF is via pairwise comparisons: human raters are asked to indicate which one of two samples they prefer. We present an interactive visualisation that better exploits the human visual ability to compare and explore whole groups of samples. The interface is comprised of two linked views: 1) an exploration view showing a contextual overview of all sampled behaviours organised in a hierarchical clustering structure; and 2) a comparison view displaying two selected groups of behaviours for user queries. Users can efficiently explore large sets of behaviours by iterating between these two views. Additionally, we devised an active learning approach suggesting groups for comparison. As shown by our evaluation in six simulated robotics tasks, our approach increases the final rewards by 69.34%. It leads to lower error rates and better policies. We open-source the code that can be easily integrated into the RLHF training loop, supporting research on human-AI alignment.
Behaviour Planning: A Toolkit for Diverse Planning
Abdelwahed, Mustafa F, Espasa, Joan, Toniolo, Alice, Gent, Ian P.
Diverse planning is the problem of generating plans with distinct characteristics. This is valuable for many real-world scenarios, including applications related to plan recognition and business process automation. In this work, we introduce \emph{Behaviour Planning}, a diverse planning toolkit that can characterise and generate diverse plans based on modular diversity models. We present a qualitative framework for describing diversity models, a planning approach for generating plans aligned with any given diversity model, and provide a practical implementation of an SMT-based behaviour planner. We showcase how the qualitative approach offered by Behaviour Planning allows it to overcome various challenges faced by previous approaches. Finally, the experimental evaluation shows the effectiveness of Behaviour Planning in generating diverse plans compared to state-of-the-art approaches.
Evolutionary Reinforcement Learning via Cooperative Coevolution
Hu, Chengpeng, Liu, Jialin, Yao, Xin
Recently, evolutionary reinforcement learning has obtained much attention in various domains. Maintaining a population of actors, evolutionary reinforcement learning utilises the collected experiences to improve the behaviour policy through efficient exploration. However, the poor scalability of genetic operators limits the efficiency of optimising high-dimensional neural networks. To address this issue, this paper proposes a novel cooperative coevolutionary reinforcement learning (CoERL) algorithm. Inspired by cooperative coevolution, CoERL periodically and adaptively decomposes the policy optimisation problem into multiple subproblems and evolves a population of neural networks for each of the subproblems. Instead of using genetic operators, CoERL directly searches for partial gradients to update the policy. Updating policy with partial gradients maintains consistency between the behaviour spaces of parents and offspring across generations. The experiences collected by the population are then used to improve the entire policy, which enhances the sampling efficiency. Experiments on six benchmark locomotion tasks demonstrate that CoERL outperforms seven state-of-the-art algorithms and baselines. Ablation study verifies the unique contribution of CoERL's core ingredients.
Learning Manner of Execution from Partial Corrections
Appelgren, Mattias, Lascarides, Alex
Some actions must be executed in different ways depending on the context. For example, wiping away marker requires vigorous force while wiping away almonds requires more gentle force. In this paper we provide a model where an agent learns which manner of action execution to use in which context, drawing on evidence from trial and error and verbal corrections when it makes a mistake (e.g., ``no, gently''). The learner starts out with a domain model that lacks the concepts denoted by the words in the teacher's feedback; both the words describing the context (e.g., marker) and the adverbs like ``gently''. We show that through the the semantics of coherence, our agent can perform the symbol grounding that's necessary for exploiting the teacher's feedback so as to solve its domain-level planning problem: to perform its actions in the current context in the right way.
Quality-Diversity Meta-Evolution: customising behaviour spaces to a meta-objective
Bossens, David M., Tarapore, Danesh
However, it was widely known that successfully converging to the maximum of that fitness function requires maintaining genetic diversity in the population of solutions (e.g., [1-4]). Moreover, the use of niching demonstrated how maintaining subpopulations could help find multiple solutions to a single problem [5]. Some studies included genetic diversity as one of the objectives of the EA [6]. Approaches in evolutionary robotics, artificial life, and neuro-evolution realised that genetic diversity does not necessarily imply a diversity of solutions, since (i) different genotypes may encode the same behaviour and vice versa; and (ii) many genotypes may encode unsafe or undesirable solutions that should be discarded during evolution (e.g., when a robot crashes into an obstacle). Such approaches began to emphasise behavioural diversity [7-10], not only as a driver for objective-based evolution but also as the enabler for diversity-or novelty-driven evolution [11]. In quality-diversity (QD) algorithms such as MAP-Elites [12] and Novelty Search with Local Competition [13], the behavioural diversity approach is combined with local competition such that the best solution for each local region in the behaviour space is stored, forming a large archive of high-quality solutions. The development of quality-diversity algorithms has allowed a plethora of applications.
Rinascimento: searching the behaviour space of Splendor
The use of Artificial Intelligence (AI) for play-testing is still on the sidelines of main applications of AI in games compared to performance-oriented game-playing. One of the main purposes of play-testing a game is gathering data on the gameplay, highlighting good and bad features of the design of the game, providing useful insight to the game designers for improving the design. Using AI agents has the potential of speeding the process dramatically. The purpose of this research is to map the behavioural space (BSpace) of a game by using a general method. Using the MAP-Elites algorithm we search the hyperparameter space Rinascimento AI agents and map it to the BSpace defined by several behavioural metrics. This methodology was able to highlight both exemplary and degenerated behaviours in the original game design of Splendor and two variations. In particular, the use of event-value functions has generally shown a remarkable improvement in the coverage of the BSpace compared to agents based on classic score-based reward signals.
On the use of feature-maps and parameter control for improved quality-diversity meta-evolution
Bossens, David M., Tarapore, Danesh
Historically, most evolutionary algorithms (EAs) were designed to optimise a fitness function, solving a single problem without considerations for generalisation to unseen problems or robustness to perturbations to the evaluation environment. However, it was widely known that successfully converging to the maximum of that fitness function requires maintaining genetic diversity in the population of solutions (see e.g., Laumanns et al. (2002); Gupta and Ghafir (2012); Ursem (2002); Ginley et al. (2011)). Moreover, the use of niching demonstrated how maintaining subpopulations could help find multiple solutions to a single problem (Mahfoud, 1995). Some studies included genetic diversity as one of the objectives of the EA (Toffolo and Benini, 2003). Approaches in evolutionary robotics, artificial life, and neuro-evolution realised that genetic diversity does not necessarily imply a diversity of solutions, since (i) different genotypes may encode the same behaviour and vice versa (especially for complex genotypes such as neural networks); and (ii) many genotypes may encode unsafe or undesirable solutions that should be discarded during evolution (e.g., self-collisions on a multi-joint robot arm). Such approaches began to emphasise behavioural diversity (Mouret and Doncieux, 2009b; Gomez, 2009; Mouret and Doncieux, 2009a; Mouret, 2010), not only as a driver for objective-based evolution but also as the enabler for diversity-or novelty-driven evolution (Lehman and Stanley, 2011a). This work is the extended version of the paper: David M. Bossens & Danesh Tarapore (2021). On the use of feature-maps for improved quality-diversity meta-evolution.