Goto

Collaborating Authors

 Perceptrons


Explainable Machine Learning: An Illustration of Kolmogorov-Arnold Network Model for Airfoil Lift Prediction

arXiv.org Artificial Intelligence

Data science has emerged as fourth paradigm of scientific exploration. However many machine learning models operate as black boxes offering limited insight into the reasoning behind their predictions. This lack of transparency is one of the drawbacks to generate new knowledge from data. Recently Kolmogorov-Arnold Network or KAN has been proposed as an alternative model which embeds explainable AI. This study demonstrates the potential of KAN for new scientific exploration. KAN along with five other popular supervised machine learning models are applied to the well-known problem of airfoil lift prediction in aerospace engineering. Standard data generated from an earlier study on 2900 different airfoils is used. KAN performed the best with an R2 score of 96.17 percent on the test data, surpassing both the baseline model and Multi Layer Perceptron. Explainability of KAN is shown by pruning and symbolizing the model resulting in an equation for coefficient of lift in terms of input variables. The explainable information retrieved from KAN model is found to be consistent with the known physics of lift generation by airfoil thus demonstrating its potential to aid in scientific exploration.


A Genetic Algorithm-Based Approach for Automated Optimization of Kolmogorov-Arnold Networks in Classification Tasks

arXiv.org Artificial Intelligence

To address the issue of interpretability in multilayer perceptrons (MLPs), Kolmogorov-Arnold Networks (KANs) are introduced in 2024. However, optimizing KAN structures is labor-intensive, typically requiring manual intervention and parameter tuning. This paper proposes GA-KAN, a genetic algorithm-based approach that automates the optimization of KANs, requiring no human intervention in the design process. To the best of our knowledge, this is the first time that evolutionary computation is explored to optimize KANs automatically. Furthermore, inspired by the use of sparse connectivity in MLPs in effectively reducing the number of parameters, GA-KAN further explores sparse connectivity to tackle the challenge of extensive parameter spaces in KANs. GA-KAN is validated on two toy datasets, achieving optimal results without the manual tuning required by the original KAN. Additionally, GA-KAN demonstrates superior performance across five classification datasets, outperforming traditional methods on all datasets and providing interpretable symbolic formulae for the Wine and Iris datasets, thereby enhancing model transparency. Furthermore, GA-KAN significantly reduces the number of parameters over the standard KAN across all the five datasets. The core contributions of GA-KAN include automated optimization, a new encoding strategy, and a new decoding process, which together improve the accuracy and interpretability, and reduce the number of parameters.


Reviews: A Flexible Generative Framework for Graph-based Semi-supervised Learning

Neural Information Processing Systems

This work employs techniques developed in network science literature, such as latent space model (LSM) and stochastic block model (SBM), to propose a generative model for features X, outputs Y, and graph G, and it uses graph neural networks to approximate the posterior of missing outputs given X, observed Y, and G. This work is a wise combination of recent methods to effectively address the problem of graph-based semi-supervised learning. However, I have some concerns, which are summarized as follows: - Although the paper proposed a new interesting generative method for graph-based semi-supervised learning, it is not super novel, as it employs the other existing methods as the blocks of their method, like LSM, SBM, GCN, GAT. - It seems the generative model is only generative for G given X and Y and by factorizing the other part as p(Y,X) p(Y X) p(X), for p(Y X), it is modeled via a multi-layer perceptron, which is a discriminative model. That is why the authors discard X in all the analyses, like any other discriminative model, and say that everything is conditioned on X. I think this makes the proposed model not fully generative. It is only generative for G but not for X and Y.


Review for NeurIPS paper: Constant-Expansion Suffices for Compressed Sensing with Generative Priors

Neural Information Processing Systems

Summary and Contributions: This paper is about compressed sensing (CS) under generative priors. In such a problem, undersampled linear measurements of a signal of interest are provided, and the signal is sought. The mathematical ambiguity is resolved by finding the feasible point that is in the range of a trained generative model (such as a GAN), which is itself computed by solving an empirical risk minimization. Existing theory establishes a convergence guarantee of an efficient algorithm under an appropriate random model for the weights of the generative prior. The convergence guarantee assumes that the generative model is a multilayer perceptron where the width of each layer grows log-linearly.


Analysis of Zero Day Attack Detection Using MLP and XAI

arXiv.org Artificial Intelligence

Any exploit taking advantage of zero-day is called a zero-day attack. Previous research and social media trends show a massive demand for research in zero-day attack detection. This paper analyzes Machine Learning (ML) and Deep Learning (DL) based approaches to create Intrusion Detection Systems (IDS) and scrutinizing them using Explainable AI (XAI) by training an explainer based on randomly sampled data from the testing set. The focus is on using the KDD99 dataset, which has the most research done among all the datasets for detecting zero-day attacks. The paper aims to synthesize the dataset to have fewer classes for multi-class classification, test ML and DL approaches on pattern recognition, establish the robustness and dependability of the model, and establish the interpretability and scalability of the model. We evaluated the performance of four multilayer perceptron (MLP) trained on the KDD99 dataset, including baseline ML models, weighted ML models, truncated ML models, and weighted truncated ML models. Our results demonstrate that the truncated ML model achieves the highest accuracy (99.62%), precision, and recall, while weighted truncated ML model shows lower accuracy (97.26%) but better class representation (less bias) among all the classes with improved unweighted recall score. We also used Shapely Additive exPlanations (SHAP) to train explainer for our truncated models to check for feature importance among the two weighted and unweighted models.


Reviews: Rethinking Kernel Methods for Node Representation Learning on Graphs

Neural Information Processing Systems

After rebuttal: thank you for the additional experiments. They strengthen the empirical contribution of the paper, so I've increased my score to a 7. ________________ Originality: The paper is a novel combination of known techniques: by reinterpreting the the iterative node aggregation procedure of Kipf et al's GCN as feature smoothing technique, they develop a novel feature mapping function for learning positive semi-definite (psd) graph kernels. The key difference from the Kipf et al approach is they separate the node aggregation and non-linear representation learning components: node features are the output of a multi-layer perceptron and then aggregated once (rather than at every layer) by a multi-hop aggregation function. They argue theoretically that this approach is universal in the sense that it can approximate any invertible psd kernel. Quality: I thought the empirical results of the paper were interesting because they suggest that decoupling the aggregation and representation learning components of GCN-style models leads to better performance (at least on these datasets).


Review for NeurIPS paper: Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

Neural Information Processing Systems

Specifically their rigorous results concern the model: for i 1,2,...,n Y_i sign( X_i,w *) for Gaussian prior on w * and Gaussian X_i where they live on d dimensions. They assume n/d \alpha (constant) and n,d grow to infinity. In [10] the Bayes optimal reconstruction error has been studied (verifying a stats physics prediction) and here they discuss about the performance of regularize ERM (potentially convex methods) to achieve it. Their first set of results is about the performance of any \ell_2 regularized convex loss and showing that their performance can be tracked using a fixed point equation. The result is based on Gordon's minimax theory and is shown then to be verifying also the replica (stats physics) prediction.


Review for NeurIPS paper: Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

Neural Information Processing Systems

All the reviewers agreed that the main results presented in this paper, the rigorous fixed-point equations for binary classification with generic loss and l2 regularizer, and more in-depth elucidation for three losses (ridge, hinge, and logistic), are sound and interesting. Although the problem setting may be thought as simple and limited, the findings in this paper are rigorous and non-trivial, which is the strength of this paper. In this regard, clarification on what statements are rigorous and what are not should be important. Some reviewers pointed out that it would be nicer if a general criterion telling if a particular loss would achieve the rate \propto \alpha {-1} be provided. I think that this point would be worth mentioning in this paper.


A Neural Network Training Method Based on Neuron Connection Coefficient Adjustments

arXiv.org Artificial Intelligence

In previous studies, we introduced a neural network framework based on symmetric differential equations, along with one of its training methods. In this article, we present another training approach for this neural network. This method leverages backward signal propagation and eliminates reliance on the traditional chain derivative rule, offering a high degree of biological interpretability. Unlike the previously introduced method, this approach does not require adjustments to the fixed points of the differential equations. Instead, it focuses solely on modifying the connection coefficients between neurons, closely resembling the training process of traditional multilayer perceptron (MLP) networks. By adopting a suitable adjustment strategy, this method effectively avoids certain potential local minima. To validate this approach, we tested it on the MNIST dataset and achieved promising results. Through further analysis, we identified certain limitations of the current neural network architecture and proposed measures for improvement.


Review for NeurIPS paper: Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains

Neural Information Processing Systems

Summary and Contributions: Using tools from the neural tangent kernel (NTK) literature, the authors show that a standard multilayer perceptron fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, they use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. The paper relies on applying the Fourier features work by Rahimi and Recht to approximate the NTK kernel. The main contributions of this paper are two fold: applying an existing seminal method to a new problem which leads to surprising and interesting findings of relevance to practitioners in deep learning; and 2) a detailed empirical study of the NTK (and its approximation) to several different image related applications .