Stable Diffusion for Reinforcement Learning
If the video generation of our suggested prompt isn't of satisfactory quality, we could decompose the problem further by for example prompting with "a video of Mario expertly jumping over an chasm". In this way every sequence of actions is associated with language, allowing us to ponder actions at multiple resolutions -- for example with the prompt "Mario expertly jumping over a chasm, and then crouching, moving right twice, left once, and then completing the remainder of the level". Can stable diffusion be used to completely resolve abstract action spaces? If Mario expertly jumps over a chasm, what is the atomic input action sequence to the game for this abstract action? There's something missing -- stable diffusion allows us to go from text to video. But now we need to go from text to text, while keeping the video in perspective.
Feb-4-2023, 10:45:43 GMT
- Technology: