Appendix for " Learning Neural Set Functions Under the Optimal Subset Oracle " 15 B Derivations 16 B.1 Derivations of the Maximum Entropy Distribution
–Neural Information Processing Systems
The probabilistic greedy model (PGM) solves optimization (1) with a differentiable extension of greedy maximization algorithm (Tschiatschek et al., 2018). To alleviate this problem, Tschiatschek et al. (2018) finally construct the set mass function by enumerating all possible permutations p However, maximizing the log likelihood of (14) is prohibitively expensive and unscalable due to the exponential time complexity of enumerating all permutations. Although one can apply Monte Carlo approximation to avoid that, i.e., approximating log p B.1 Derivations of the Maximum Entropy Distribution The first step to solve problem (2) is to construct a proper set mass function p Here, one would care about what the most appropriate set mass function should be? Generally we prefer the model to assume nothing about what is unknown. More formally, we should choose the most "uniform" distribution, which maximizes the Shannon entropy H(p) = This principle is known as "noninformative prior" (Jeffreys, 1946), which has been widely applied in many physical systems (Jaynes, 1957a,b). It turns out that the energy-based model is the only distribution with maximum entropy. More specifically, the following theorem holds: Theorem 1.
Neural Information Processing Systems
Feb-10-2025, 17:34:22 GMT