Review for NeurIPS paper: Non-parametric Models for Non-negative Functions
–Neural Information Processing Systems
Relation to Prior Work: 1) Intuitively, the proposed model seems hugely over-parametrized (O(n 2) parameters!) for the described purpose of modeling non-negative functions. Indeed, in the proof of Theorem 3, to obtain a cc-universal approximator, it suffices to take an operator A of the form A ww T. From a statistical perspective, a preferable model would simply be f_w(x) (w T \phi(x)) 2. The benefit of allowing A to be full-rank is convexity, which makes the model easier to fit. The prior knowledge that the optimization problem has an exact rank-1 solution is presumably the motivation for imposing a nuclear norm constraint. I think clarifying this logic would help motivate the model, as well as the elastic net regularization proposed in (6). I am confused about why one would fix the bandwidth.
Neural Information Processing Systems
Jan-26-2025, 17:55:33 GMT
- Technology: