On Training of Kolmogorov-Arnold Networks

Nov-7-2024–arXiv.org Artificial Intelligence

For the last decade, the Perceptron [1] has served as the defacto building block of deep neural networks that utilize a fully connected layer in their architecture. Recent breakthroughs in large language models are due partly to large stacks of Transformer [2] units, which also utilize Perceptrons as part of their internal machinery. The popularity of Perceptron based architectures can be attributed mainly to two things: flexibility for learning non-linear functions [3], and an inherent ability to parallelize on modern GPU architecture [4]. While Perceptron-free architectures have been shown to display impressive performance [5], the fully-connected Perceptron layer remains a staple in high performing models. Recently, a viable alternative to the Perceptron unit for fully connected layers has been proposed - the Kolmogorov-Arnold unit [6].

architecture, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Nov-7-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Oregon > Multnomah County
    - Portland (0.04)
  - Illinois > Champaign County
    - Urbana (0.04)
    - Champaign (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)