Supplementary material: Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments

Apr-24-2026, 16:31:23 GMT–Neural Information Processing Systems

We will use the well known Performance Difference Lemma [16] in our analysis. We can obtain a performance difference lemma for the meta-policies as follows. Here, we get (a)is from Assumption 3.1 from which we have P In this section, we describe all the simulation and real-world environments in detail. B.1 Simulation Environments Point 2DNavigation: Point 2DNavigation [9] is a 2 dimensional goal reaching environment with S R2, A R2, and the following dynamics, xt+1 = xt +dxt, yt+1 = xt +dyt, such that dx2t +dy2t 0.12 Where xt and yt are the x and y location of the agent, dxt and dyt are the actions taken which correspond to the displacement in the x and y direction respectively, all taken at time step t. The goals are located on a semi circle of radius 2, and the episode terminates when the agent reaches the goal or spends more than 100time steps in the environment.

arxiv preprint arxiv, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Apr-24-2026, 16:31:23 GMT

Conferences PDF

Add feedback

Industry:
- Leisure & Entertainment > Games (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Machine Learning
    - Neural Networks (1.00)
    - Reinforcement Learning (0.87)

Duplicate Docs Excel Report

Title
122f45f4d451617ac87adf7024ee14cd-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found