Thomaz, Andrea L

Policy Shaping with Human Teachers

AAAI Conferences

In this work we evaluate the performance of a policy shaping algorithm using 26 human teachers. We examine if the algorithm is suitable for human-generated data on two different boards in a pac-man domain, comparing performance to an oracle that provides critique based on one known winning policy. Perhaps surprisingly, we show that the data generated by our 26 participants yields even better performance for the agent than data generated by the oracle. This might be because humans do not discourage exploring multiple winning policies. Additionally, we evaluate the impact of different verbal instructions, and different interpretations of silence, finding that the usefulness of data is affected both by what instructions is given to teachers, and how the data is interpreted.

Representing Skill Demonstrations for Adaptation and Transfer

AAAI Conferences

We address two domains of skill transfer problems encountered by an autonomous robot: within-domain adaptation and cross-domain transfer. Our aim is to provide skill representations which enable transfer in each problem classification. As such, we explore two approaches to skill representation which address each problem classification separately. The first representation, based on mimicking, encodes the full demonstration and is well suited for within-domain adaptation. The second representation is based on imitation and serves to encode a set of key points along the trajectory, which represent the goal points most relevant to the successful completion of the skill. This representation enables both within-domain and cross-domain transfer. A planner is then applied to these constraints, generating a domain-specific trajectory which addresses the transfer task.