Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
–Neural Information Processing Systems
We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require N hidden nodes to memorize/interpolate arbitrary N data points. In contrast, by exploiting depth, we show that 3-layer ReLU networks with \Omega(\sqrt{N}) hidden nodes can perfectly memorize most datasets with N points. We also prove that width \Theta(\sqrt{N}) is necessary and sufficient for memorizing N data points, proving tight bounds on memorization capacity. The sufficiency result can be extended to deeper networks; we show that an L -layer network with W parameters in the hidden layers can memorize N data points if W \Omega(N) .
Neural Information Processing Systems
Oct-11-2024, 02:10:34 GMT
- Technology: