Network size and size of the weights in memorization with two-layers neural networks

Oct-9-2024, 22:41:21 GMT–Neural Information Processing Systems

In 1988, Eric B. Baum showed that two-layers neural networks with threshold activation function can perfectly memorize the binary labels of n points in general position in \R d using only \ulcorner n/d \urcorner neurons. We observe that with ReLU networks, using four times as many neurons one can fit arbitrary real labels. Moreover, for approximate memorization up to error \epsilon, the neural tangent kernel can also memorize with only O\left(\frac{n}{d} \cdot \log(1/\epsilon) \right) neurons (assuming that the data is well dispersed too). We show however that these constructions give rise to networks where the \emph{magnitude} of the neurons' weights are far from optimal. In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right) neurons, as well as nearly-optimal size of the weights.

artificial intelligence, machine learning, two-layer neural network, (6 more...)

Neural Information Processing Systems

Oct-9-2024, 22:41:21 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Memory-Based Learning > Rote Learning (0.91)
  - Neural Networks (1.00)