On Training of Kolmogorov-Arnold Networks
–arXiv.org Artificial Intelligence
For the last decade, the Perceptron [1] has served as the defacto building block of deep neural networks that utilize a fully connected layer in their architecture. Recent breakthroughs in large language models are due partly to large stacks of Transformer [2] units, which also utilize Perceptrons as part of their internal machinery. The popularity of Perceptron based architectures can be attributed mainly to two things: flexibility for learning non-linear functions [3], and an inherent ability to parallelize on modern GPU architecture [4]. While Perceptron-free architectures have been shown to display impressive performance [5], the fully-connected Perceptron layer remains a staple in high performing models. Recently, a viable alternative to the Perceptron unit for fully connected layers has been proposed - the Kolmogorov-Arnold unit [6].
arXiv.org Artificial Intelligence
Nov-7-2024
- Country:
- Genre:
- Research Report (0.40)
- Technology: