Appendix for " Learning Neural Set Functions Under the Optimal Subset Oracle " 15 B Derivations 16 B.1 Derivations of the Maximum Entropy Distribution

Feb-10-2025, 17:34:22 GMT–Neural Information Processing Systems

The probabilistic greedy model (PGM) solves optimization (1) with a differentiable extension of greedy maximization algorithm (Tschiatschek et al., 2018). To alleviate this problem, Tschiatschek et al. (2018) finally construct the set mass function by enumerating all possible permutations p However, maximizing the log likelihood of (14) is prohibitively expensive and unscalable due to the exponential time complexity of enumerating all permutations. Although one can apply Monte Carlo approximation to avoid that, i.e., approximating log p B.1 Derivations of the Maximum Entropy Distribution The first step to solve problem (2) is to construct a proper set mass function p Here, one would care about what the most appropriate set mass function should be? Generally we prefer the model to assume nothing about what is unknown. More formally, we should choose the most "uniform" distribution, which maximizes the Shannon entropy H(p) = This principle is known as "noninformative prior" (Jeffreys, 1946), which has been widely applied in many physical systems (Jaynes, 1957a,b). It turns out that the energy-based model is the only distribution with maximum entropy. More specifically, the following theorem holds: Theorem 1.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Feb-10-2025, 17:34:22 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Maximum Entropy (0.81)

Duplicate Docs Excel Report

Title
e332505c4c80ad1d9dc0af26103b672b-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found