On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration
Zhou, Yirui, Liu, Xiaowei, Zhang, Xiaofeng, Zhang, Yangchun
Imitation learning (IL) (Pomerleau, 1991; Ng et al., 2000; Syed and Schapire, 2007; Ho and Ermon, 2016), a realm distinct from standard reinforcement learning (RL) (Puterman, 2014; Sutton and Barto, 2018), is independent on rewards provided by the environment. This characteristic makes IL particularly suited for numerous real-world applications (Bhattacharyya et al., 2018; Shi et al., 2019; Jabri, 2021). The general IL paradigm leverages the guidance from expert demonstrations with information of both states and actions to mimic an outstanding policy (Abbeel and Ng, 2004; Ho and Ermon, 2016; Kostrikov et al., 2020). According to the strategy of policy training, IL is divided into two main schemes based on policy training strategy: on-policy and off-policy training. The on-policy scheme (Ho and Ermon, 2016; Chen et al., 2020) is noted for its stability but requires a significant volume of samples.
Jan-22-2025
- Country:
- Asia > China (0.04)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Genre:
- Overview (0.67)
- Research Report (0.50)
- Technology: