Text-AwareDiffusionforPolicyLearning

Neural Information Processing Systems 

Training an agent to achieve particular goals or perform desired behaviors is often accomplished through reinforcement learning, especially in the absence of expert demonstrations. However, supporting novel goals or behaviors through reinforcement learning requires the ad-hoc design of appropriate reward functions, which quickly becomes intractable. Toaddress thischallenge, wepropose Text-AwareDiffusion forPolicyLearning (TADPoLe), which uses apretrained, frozen text-conditioned diffusion model to compute dense zero-shot reward signals for text-aligned policy learning.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found