Knowledge transfer has been suggested as a useful approach for solving large Markov Decision Processes. The main idea is to compute a decision-making policy in one environment and use it in a different environment, provided the two are "close enough". In this paper, we use bisimulation-style metrics (Ferns et al., 2004) to guide knowledge transfer. We propose algorithms that decide what actions to transfer from the policy computed on a small MDP task to a large task, given the bisimulation distance between states in the two tasks. We demonstrate the inherent "pessimism" of bisimulation metrics and present variants of this metric aimed to overcome this pessimism, leading to improved action transfer. We also show that using this approach for transferring temporally extended actions (Sutton et al., 1999) is more successful than using it exclusively with primitive actions. We present theoretical guarantees on the quality of the transferred policy, as well as promising empirical results.
Context transfer, as defined in this article, implies knowledge transfer between tasks that share the same environment's dynamics and reward function, but have different state and action spaces. For example, we have a working mobile robot in an environment. At some point, we decide to upgrade its sensors and/or actuators. Any change in these modules will result in a different description of the agent-environment model, and the trained knowledge is no longer applicable. We consider the tasks of the old and new robots, as the source and target tasks, respectively. The Markov decision process (MDP) of these tasks, under certain conditions, are called Q-transferable tasks, and the problem of knowledge transfer between them is called context transfer. We investigate the relation of the MDPs of these tasks.
As knowledge transfer research progresses from single transfer to lifelong learning scenarios, it becomes increasingly important to properly select the source knowledge that would best transfer to the target task. In this position paper, we describe our previous work on selective knowledge transfer and relate it to problems in lifelong learning. We also briefly discuss our ongoing work to develop lifelong learning methods capable of continual transfer between tasks and the incorporation of guidance from an expert human user.
Robots capable of growing knowledge and learning new tasks is of demanding interest. We formalize knowledge transfer in human-robot interactions, and establish a testing framework for it. As a proof of concept, we implement a robot system that not only learns in real-time from human demonstrations, but also transfers this knowledge.
Transfer learning leverages the knowledge in one domain, the source domain, to improve learning efficiency in another domain, the target domain. Existing transfer learning research is relatively well-progressed, but only in situations where the feature spaces of the domains are homogeneous and the target domain contains at least a few labeled instances. However, transfer learning has not been well-studied in heterogeneous settings with an unlabeled target domain. To contribute to the research in this emerging field, this paper presents: (1) an unsupervised knowledge transfer theorem that prevents negative transfer; and (2) a principal angle-based metric to measure the distance between two pairs of domains. The metric shows the extent to which homogeneous representations have preserved the information in original source and target domains. The unsupervised knowledge transfer theorem sets out the transfer conditions necessary to prevent negative transfer. Linear monotonic maps meet the transfer conditions of the theorem and, hence, are used to construct homogeneous representations of the heterogeneous domains, which in principle prevents negative transfer. The metric and the theorem have been implemented in an innovative transfer model, called a Grassmann-LMM-geodesic flow kernel (GLG), that is specifically designed for knowledge transfer across heterogeneous domains. The GLG model learns homogeneous representations of heterogeneous domains by minimizing the proposed metric. Knowledge is transferred through these learned representations via a geodesic flow kernel. Notably, the theorem presented in this paper provides the sufficient transfer conditions needed to guarantee that knowledge is transferred from a source domain to an unlabeled target domain with correctness.