f5f3b8d720f34ebebceb7765e447268b-Supplemental.pdf
–Neural Information Processing Systems
For all k N+ and k dst s,ϕ 1(g) = n, let g = ϕ(sk), and let τ = (s0,s1,s2,,sk) be the k-step sub-trajectory ofτ from s0 to sk. Using the triangle inequality, we can prove that the sub-trajectoryτ = (s0,s1,s2,,sk) is also a shortest trajectory froms0 = s to sk: assume that this is not true and there exists a shorter trajectory froms0 tosk. Using Theorem 1, we have that for each subgoalgkt, t = 0,1,,T 1, there exists a subgoal gkt GA(skt,k) that can induce the same low-levelk-step action sequence asgkt. When the temporal distance between twostates inonetrajectory isnotlargerthank,then thecorresponding element in the adjacency matrix will be labeled to 1, indicating the adjacency. The main differences between our method and theirs are: 1) We use trajectories sampled by multiple policies to construct training samples, while theyonly use trajectories sampled by one specific policy; 2) Weuse an adjacency matrix to explicitly aggregate the adjacency information and sample training pairs based on the adjacency matrix, while they directly sample training pairs from trajectories.
Neural Information Processing Systems
Feb-11-2026, 03:37:26 GMT
- Technology: