Goto

Collaborating Authors

 indicator function


A Note on Non-Negative $L_1$-Approximating Polynomials

arXiv.org Machine Learning

$L_1$-Approximating polynomials, i.e., polynomials that approximate indicator functions in $L_1$-norm under certain distributions, are widely used in computational learning theory. We study the existence of \textit{non-negative} $L_1$-approximating polynomials with respect to Gaussian distributions. This is a stronger requirement than $L_1$-approximation but weaker than sandwiching polynomials (which themselves have many applications). These non-negative approximating polynomials have recently found uses in smoothed learning from positive-only examples. In this short note, we prove that every class of sets with Gaussian surface area (GSA) at most $Γ$ under the standard Gaussian admits degree-$k$ non-negative polynomials that $\eps$-approximate its indicator functions in $L_1$-norm, for $k=\tilde{O}(Γ^2/\varepsilon^2)$. Equivalently, finite GSA implies $L_1$-approximation with the stronger pointwise guarantee that the approximating polynomial has range contained in $[0,\infty)$. Up to a constant-factor, this matches the degree of the best currently known Gaussian $L_1$-approximation degree bound without the non-negativity constraint.


Appendices

Neural Information Processing Systems

When e 6 WΦ, we have E = Rd and WΦ,E = WΦ. By Theorem 1 in [10], we know that the projected Bellman equation (3.4) has a unique fixed point θ . Thus, L= {θ }. 2. When e WΦ, θe is a unique solution to Φθ = eas Φ is full column rank. We first show that the set of solutions to the projected Bellman equation (3.4) takes the form { θ+ cθe|c R}, where θis any solution to (3.4). On the other hand, suppose that θis not of the form θ+ cθe.






AutoPrune: AutomaticNetworkPruningby RegularizingAuxiliaryParameters

Neural Information Processing Systems

Tobuildabettergeneralized and easy-to-use pruning method, we propose AutoPrune, which prunes the network through optimizing a set of trainable auxiliary parameters instead of original weights.


Batched Thompson Sampling

Neural Information Processing Systems

O (log log(T)) expected batch complexity. This is achieved through a dynamic batching strategy, which uses the agents estimates to adaptively increase the batch duration.