terminal state
Learning Shortest Paths with Generative Flow Networks
Morozov, Nikita, Maksimov, Ian, Tiapkin, Daniil, Samsonov, Sergey
In this paper, we present a novel learning framework for finding shortest paths in graphs utilizing Generative Flow Networks (GFlowNets). First, we examine theoretical properties of GFlowNets in non-acyclic environments in relation to shortest paths. We prove that, if the total flow is minimized, forward and backward policies traverse the environment graph exclusively along shortest paths between the initial and terminal states. Building on this result, we show that the pathfinding problem in an arbitrary graph can be solved by training a non-acyclic GFlowNet with flow regularization. We experimentally demonstrate the performance of our method in pathfinding in permutation environments and in solving Rubik's Cubes. For the latter problem, our approach shows competitive results with state-of-the-art machine learning approaches designed specifically for this task in terms of the solution length, while requiring smaller search budget at test-time.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- North America > Canada (0.04)
- (2 more...)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Information Technology (0.67)
- Leisure & Entertainment (0.45)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
- Europe > Sweden > Skåne County > Malmö (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Learning to Discover Skills through Guidance Hyunseung Kim,1 Byungkun Lee,1 Hojoon Lee
However, we have identified that the effectiveness of these rewards declines as the environmental complexity rises. Therefore, we present a novel USD algorithm, skill disco very with gui dance ( DISCO-DANCE), which (1) selects the guide skill that possesses the highest potential to reach unexplored states, (2) guides other skills to follow guide skill, then (3) the guided skills are dispersed to maximize their discriminability in unexplored states. Empirical evaluation demonstrates that DISCO-DANCE outperforms other USD baselines in challenging environments, including two navigation benchmarks and a continuous control benchmark.
2bde8fef08f7ebe42b584266cbcfc909-Paper-Conference.pdf
To do so, we extend to neural activity the maximum occupancy principle (MOP) developed for behavior, and refer to this new neural principle asNeuroMOP.NeuroMOP posits thatthegoal ofthenervoussystem istomaximize future action-state entropy, a reward-free, intrinsic motivation that entails creating allpossible activity patterns while avoiding terminal ordangerous ones.
- Europe > Sweden (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- (6 more...)
- North America > United States (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)