Appendix
–Neural Information Processing Systems
Let x be a root ofG(,θ), i.e., G(x,θ) = 0. Frank-Wolfeimplementations typically maintain theconvexweights ofthe vertices, which we use to get an approximation ofp?(θ). As a more advanced example, we now describe how to implement the KKT conditions(6). In all experiments, we only show how to compute gradients with the outer objective. There areseveral ways to differentiate this projection. The first is to use the KKT conditions as detailed in 2.2.
Neural Information Processing Systems
Feb-7-2026, 21:25:47 GMT
- Technology: