Thai, Duy-Quy
PRKAN: Parameter-Reduced Kolmogorov-Arnold Networks
Ta, Hoang-Thang, Thai, Duy-Quy, Tran, Anh, Sidorov, Grigori, Gelbukh, Alexander
MLPs have been one of key components in modern neural network architectures for years. Their simplicity makes them widely used for capturing complex relationships through multiple layers of non-linear transformations. However, their role has been reconsidered recently with the revival of Kolmogorov-Arnold Networks (KANs) [1, 2]. In these papers, fixed activation functions used in MLPs are described as "nodes," and the authors proposed replacing them with learnable activation functions like B-splines, referred to as "edges", to improve performance in mathematical and physical examples. To address Hilbert's 13th problem [3], the Kolmogorov-Arnold Representation Theorem (KART) [4] was introduced. It posits that any continuous function involving multiple variables can be decomposed into a sum of continuous functions of single variables, thus inspiring the creation of KANs. The work of Liu et al. [1] on KANs has inspired numerous studies exploring the use of various basis and polynomial functions as replacements for B-splines [5, 6, 7, 8, 9, 10, 11, 12, 13], investigating the model's performance compared to MLPs. Several studies have shown that KANs do not always outperform MLPs when using the same training parameters [14, 15]. Moreover, while KANs achieve better performance than MLPs with the same network structure, they often require a significantly larger number of parameters [7, 16, 17, 18, 19].