Goto

Collaborating Authors

 terminal state


Learning Shortest Paths with Generative Flow Networks

Morozov, Nikita, Maksimov, Ian, Tiapkin, Daniil, Samsonov, Sergey

arXiv.org Machine Learning

In this paper, we present a novel learning framework for finding shortest paths in graphs utilizing Generative Flow Networks (GFlowNets). First, we examine theoretical properties of GFlowNets in non-acyclic environments in relation to shortest paths. We prove that, if the total flow is minimized, forward and backward policies traverse the environment graph exclusively along shortest paths between the initial and terminal states. Building on this result, we show that the pathfinding problem in an arbitrary graph can be solved by training a non-acyclic GFlowNet with flow regularization. We experimentally demonstrate the performance of our method in pathfinding in permutation environments and in solving Rubik's Cubes. For the latter problem, our approach shows competitive results with state-of-the-art machine learning approaches designed specifically for this task in terms of the solution length, while requiring smaller search budget at test-time.





Learning to Discover Skills through Guidance Hyunseung Kim,1 Byungkun Lee,1 Hojoon Lee

Neural Information Processing Systems

However, we have identified that the effectiveness of these rewards declines as the environmental complexity rises. Therefore, we present a novel USD algorithm, skill disco very with gui dance ( DISCO-DANCE), which (1) selects the guide skill that possesses the highest potential to reach unexplored states, (2) guides other skills to follow guide skill, then (3) the guided skills are dispersed to maximize their discriminability in unexplored states. Empirical evaluation demonstrates that DISCO-DANCE outperforms other USD baselines in challenging environments, including two navigation benchmarks and a continuous control benchmark.



2bde8fef08f7ebe42b584266cbcfc909-Paper-Conference.pdf

Neural Information Processing Systems

To do so, we extend to neural activity the maximum occupancy principle (MOP) developed for behavior, and refer to this new neural principle asNeuroMOP.NeuroMOP posits thatthegoal ofthenervoussystem istomaximize future action-state entropy, a reward-free, intrinsic motivation that entails creating allpossible activity patterns while avoiding terminal ordangerous ones.




Temporally-ConsistentSurvivalAnalysis

Neural Information Processing Systems

Wemodel theeventofinterest asaspecial terminal state, andwe seek to estimate the survival distribution (i.e., the distribution of the hitting time for that terminal state) from anyother state.