AITopics

2501.17896

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > Singapore (0.04)
Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.67)

Industry:

Aerospace & Defense (0.56)
Transportation > Air (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

arXiv.org Artificial IntelligenceJan-28-2025

A Genetic Algorithm-Based Approach for Automated Optimization of Kolmogorov-Arnold Networks in Classification Tasks

Long, Quan, Wang, Bin, Xue, Bing, Zhang, Mengjie

To address the issue of interpretability in multilayer perceptrons (MLPs), Kolmogorov-Arnold Networks (KANs) are introduced in 2024. However, optimizing KAN structures is labor-intensive, typically requiring manual intervention and parameter tuning. This paper proposes GA-KAN, a genetic algorithm-based approach that automates the optimization of KANs, requiring no human intervention in the design process. To the best of our knowledge, this is the first time that evolutionary computation is explored to optimize KANs automatically. Furthermore, inspired by the use of sparse connectivity in MLPs in effectively reducing the number of parameters, GA-KAN further explores sparse connectivity to tackle the challenge of extensive parameter spaces in KANs. GA-KAN is validated on two toy datasets, achieving optimal results without the manual tuning required by the original KAN. Additionally, GA-KAN demonstrates superior performance across five classification datasets, outperforming traditional methods on all datasets and providing interpretable symbolic formulae for the Wine and Iris datasets, thereby enhancing model transparency. Furthermore, GA-KAN significantly reduces the number of parameters over the standard KAN across all the five datasets. The core contributions of GA-KAN include automated optimization, a new encoding strategy, and a new decoding process, which together improve the accuracy and interpretability, and reduce the number of parameters.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

2501.17411

Country:

North America > United States > Wisconsin (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsJan-27-2025, 15:06:48 GMT

Reviews: A Flexible Generative Framework for Graph-based Semi-supervised Learning

This work employs techniques developed in network science literature, such as latent space model (LSM) and stochastic block model (SBM), to propose a generative model for features X, outputs Y, and graph G, and it uses graph neural networks to approximate the posterior of missing outputs given X, observed Y, and G. This work is a wise combination of recent methods to effectively address the problem of graph-based semi-supervised learning. However, I have some concerns, which are summarized as follows: - Although the paper proposed a new interesting generative method for graph-based semi-supervised learning, it is not super novel, as it employs the other existing methods as the blocks of their method, like LSM, SBM, GCN, GAT. - It seems the generative model is only generative for G given X and Y and by factorizing the other part as p(Y,X) p(Y X) p(X), for p(Y X), it is modeled via a multi-layer perceptron, which is a discriminative model. That is why the authors discard X in all the analyses, like any other discriminative model, and say that everything is conditioned on X. I think this makes the proposed model not fully generative. It is only generative for G but not for X and Y.

flexible generative framework, graph neural network, graph-based semi-supervised learning, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.58)

Neural Information Processing SystemsJan-27-2025, 03:17:39 GMT

Review for NeurIPS paper: Constant-Expansion Suffices for Compressed Sensing with Generative Priors

Summary and Contributions: This paper is about compressed sensing (CS) under generative priors. In such a problem, undersampled linear measurements of a signal of interest are provided, and the signal is sought. The mathematical ambiguity is resolved by finding the feasible point that is in the range of a trained generative model (such as a GAN), which is itself computed by solving an empirical risk minimization. Existing theory establishes a convergence guarantee of an efficient algorithm under an appropriate random model for the weights of the generative prior. The convergence guarantee assumes that the generative model is a multilayer perceptron where the width of each layer grows log-linearly.

compressed sensing, constant-expansion suffice, generative model, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.62)

Dahal, Ashim, Bajgai, Prabin, Rahimi, Nick

Analysis of Zero Day Attack Detection Using MLP and XAI

arXiv.org Artificial IntelligenceJan-27-2025

Any exploit taking advantage of zero-day is called a zero-day attack. Previous research and social media trends show a massive demand for research in zero-day attack detection. This paper analyzes Machine Learning (ML) and Deep Learning (DL) based approaches to create Intrusion Detection Systems (IDS) and scrutinizing them using Explainable AI (XAI) by training an explainer based on randomly sampled data from the testing set. The focus is on using the KDD99 dataset, which has the most research done among all the datasets for detecting zero-day attacks. The paper aims to synthesize the dataset to have fewer classes for multi-class classification, test ML and DL approaches on pattern recognition, establish the robustness and dependability of the model, and establish the interpretability and scalability of the model. We evaluated the performance of four multilayer perceptron (MLP) trained on the KDD99 dataset, including baseline ML models, weighted ML models, truncated ML models, and weighted truncated ML models. Our results demonstrate that the truncated ML model achieves the highest accuracy (99.62%), precision, and recall, while weighted truncated ML model shows lower accuracy (97.26%) but better class representation (less bias) among all the classes with improved unweighted recall score. We also used Shapely Additive exPlanations (SHAP) to train explainer for our truncated models to check for feature importance among the two weighted and unweighted models.

artificial intelligence, deep learning, machine learning, (15 more...)

2501.16638

Country:

North America > United States > Mississippi > Forrest County > Hattiesburg (0.14)
North America > United States > Hawaii (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Neural Information Processing SystemsJan-26-2025, 23:58:54 GMT

Reviews: Rethinking Kernel Methods for Node Representation Learning on Graphs

After rebuttal: thank you for the additional experiments. They strengthen the empirical contribution of the paper, so I've increased my score to a 7. ________________ Originality: The paper is a novel combination of known techniques: by reinterpreting the the iterative node aggregation procedure of Kipf et al's GCN as feature smoothing technique, they develop a novel feature mapping function for learning positive semi-definite (psd) graph kernels. The key difference from the Kipf et al approach is they separate the node aggregation and non-linear representation learning components: node features are the output of a multi-layer perceptron and then aggregated once (rather than at every layer) by a multi-hop aggregation function. They argue theoretically that this approach is universal in the sense that it can approximate any invertible psd kernel. Quality: I thought the empirical results of the paper were interesting because they suggest that decoupling the aggregation and representation learning components of GCN-style models leads to better performance (at least on these datasets).

contribution, node representation learning, rethinking kernel method, (4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.40)

Neural Information Processing SystemsJan-26-2025, 12:53:13 GMT

Review for NeurIPS paper: Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

Specifically their rigorous results concern the model: for i 1,2,...,n Y_i sign( X_i,w *) for Gaussian prior on w * and Gaussian X_i where they live on d dimensions. They assume n/d \alpha (constant) and n,d grow to infinity. In [10] the Bayes optimal reconstruction error has been studied (verifying a stats physics prediction) and here they discuss about the performance of regularize ERM (potentially convex methods) to achieve it. Their first set of results is about the performance of any \ell_2 regularized convex loss and showing that their performance can be tracked using a fixed point equation. The result is based on Gordon's minimax theory and is shown then to be verifying also the replica (stats physics) prediction.

approaching bayes error, generalization error, high-dimensional perceptron, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.44)

Neural Information Processing SystemsJan-26-2025, 12:53:05 GMT

Review for NeurIPS paper: Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

All the reviewers agreed that the main results presented in this paper, the rigorous fixed-point equations for binary classification with generic loss and l2 regularizer, and more in-depth elucidation for three losses (ridge, hinge, and logistic), are sound and interesting. Although the problem setting may be thought as simple and limited, the findings in this paper are rigorous and non-trivial, which is the strength of this paper. In this regard, clarification on what statements are rigorous and what are not should be important. Some reviewers pointed out that it would be nicer if a general criterion telling if a particular loss would achieve the rate \propto \alpha {-1} be provided. I think that this point would be worth mentioning in this paper.

approaching bayes error, generalization error, high-dimensional perceptron, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.40)

arXiv.org Artificial IntelligenceJan-25-2025

A Neural Network Training Method Based on Neuron Connection Coefficient Adjustments

Jiang, Kun

In previous studies, we introduced a neural network framework based on symmetric differential equations, along with one of its training methods. In this article, we present another training approach for this neural network. This method leverages backward signal propagation and eliminates reliance on the traditional chain derivative rule, offering a high degree of biological interpretability. Unlike the previously introduced method, this approach does not require adjustments to the fixed points of the differential equations. Instead, it focuses solely on modifying the connection coefficients between neurons, closely resembling the training process of traditional multilayer perceptron (MLP) networks. By adopting a suitable adjustment strategy, this method effectively avoids certain potential local minima. To validate this approach, we tested it on the MNIST dataset and achieved promising results. Through further analysis, we identified certain limitations of the current neural network architecture and proposed measures for improvement.

artificial intelligence, machine learning, neural network, (16 more...)

2502.10414

Country:

Europe > United Kingdom > England > Staffordshire (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.70)

Neural Information Processing SystemsJan-24-2025, 12:19:55 GMT

Review for NeurIPS paper: Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains

Summary and Contributions: Using tools from the neural tangent kernel (NTK) literature, the authors show that a standard multilayer perceptron fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, they use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. The paper relies on applying the Fourier features work by Rahimi and Recht to approximate the NTK kernel. The main contributions of this paper are two fold: applying an existing seminal method to a new problem which leads to surprising and interesting findings of relevance to practitioners in deep learning; and 2) a detailed empirical study of the NTK (and its approximation) to several different image related applications .

low dimensional domain, network learn high frequency function, neurips paper, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.70)