Goto

Collaborating Authors

 Perceptrons


Activation Space Selectable Kolmogorov-Arnold Networks

arXiv.org Artificial Intelligence

The multilayer perceptron (MLP), a fundamental paradigm in current artificial intelligence, is widely applied in fields such as computer vision and natural language processing. However, the recently proposed Kolmogorov-Arnold Network (KAN), based on nonlinear additive connections, has been proven to achieve performance comparable to MLPs with significantly fewer parameters. Despite this potential, the use of a single activation function space results in reduced performance of KAN and related works across different tasks. To address this issue, we propose an activation space Selectable KAN (S-KAN). S-KAN employs an adaptive strategy to choose the possible activation mode for data at each feedforward KAN node. Our approach outperforms baseline methods in seven representative function fitting tasks and significantly surpasses MLP methods with the same level of parameters. Furthermore, we extend the structure of S-KAN and propose an activation space selectable Convolutional KAN (S-ConvKAN), which achieves leading results on four general image classification datasets. Our method mitigates the performance variability of the original KAN across different tasks and demonstrates through extensive experiments that feedforward KANs with selectable activations can achieve or even exceed the performance of MLP-based methods. This work contributes to the understanding of the data-centric design of new AI paradigms and provides a foundational reference for innovations in KAN-based network architectures.


KAN versus MLP on Irregular or Noisy Functions

arXiv.org Artificial Intelligence

In this paper, we compare the performance of Kolmogorov-Arnold Networks (KAN) and Multi-Layer Perceptron (MLP) networks on irregular or noisy functions. We control the number of parameters and the size of the training samples to ensure a fair comparison. For clarity, we categorize the functions into six types: regular functions, continuous functions with local non-differentiable points, functions with jump discontinuities, functions with singularities, functions with coherent oscillations, and noisy functions. Our experimental results indicate that KAN does not always perform best. For some types of functions, MLP outperforms or performs comparably to KAN. Furthermore, increasing the size of training samples can improve performance to some extent. When noise is added to functions, the irregular features are often obscured by the noise, making it challenging for both MLP and KAN to extract these features effectively. We hope these experiments provide valuable insights for future neural network research and encourage further investigations to overcome these challenges.


HDRGS: High Dynamic Range Gaussian Splatting

arXiv.org Artificial Intelligence

Recent years have witnessed substantial advancements in the field of 3D reconstruction from 2D images, particularly following the introduction of the neural radiance field (NeRF) technique. However, reconstructing a 3D high dynamic range (HDR) radiance field, which aligns more closely with real-world conditions, from 2D multi-exposure low dynamic range (LDR) images continues to pose significant challenges. Approaches to this issue fall into two categories: grid-based and implicit-based. Implicit methods, using multi-layer perceptrons (MLP), face inefficiencies, limited solvability, and overfitting risks. Conversely, grid-based methods require significant memory and struggle with image quality and long training times. In this paper, we introduce Gaussian Splatting-a recent, high-quality, real-time 3D reconstruction technique-into this domain. We further develop the High Dynamic Range Gaussian Splatting (HDR-GS) method, designed to address the aforementioned challenges. This method enhances color dimensionality by including luminance and uses an asymmetric grid for tone-mapping, swiftly and precisely converting pixel irradiance to color. Our approach improves HDR scene recovery accuracy and integrates a novel coarse-to-fine strategy to speed up model convergence, enhancing robustness against sparse viewpoints and exposure extremes, and preventing local optima. Extensive testing confirms that our method surpasses current state-of-the-art techniques in both synthetic and real-world scenarios. Code will be released at \url{https://github.com/WuJH2001/HDRGS}


Body Transformer: Leveraging Robot Embodiment for Policy Learning

arXiv.org Artificial Intelligence

In recent years, the transformer architecture has become the de facto standard for machine learning algorithms applied to natural language processing and computer vision. Despite notable evidence of successful deployment of this architecture in the context of robot learning, we claim that vanilla transformers do not fully exploit the structure of the robot learning problem. Therefore, we propose Body Transformer (BoT), an architecture that leverages the robot embodiment by providing an inductive bias that guides the learning process. We represent the robot body as a graph of sensors and actuators, and rely on masked attention to pool information throughout the architecture. The resulting architecture outperforms the vanilla transformer, as well as the classical multilayer perceptron, in terms of task completion, scaling properties, and computational efficiency when representing either imitation or reinforcement learning policies. Additional material including the open-source code is available at https://sferrazza.cc/bot_site.


Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation

arXiv.org Artificial Intelligence

The ability to interpret Machine Learning (ML) models is becoming increasingly essential. However, despite significant progress in the field, there remains a lack of rigorous characterization regarding the innate interpretability of different models. In an attempt to bridge this gap, recent work has demonstrated that it is possible to formally assess interpretability by studying the computational complexity of explaining the decisions of various models. In this setting, if explanations for a particular model can be obtained efficiently, the model is considered interpretable (since it can be explained ``easily''). However, if generating explanations over an ML model is computationally intractable, it is considered uninterpretable. Prior research identified two key factors that influence the complexity of interpreting an ML model: (i) the type of the model (e.g., neural networks, decision trees, etc.); and (ii) the form of explanation (e.g., contrastive explanations, Shapley values, etc.). In this work, we claim that a third, important factor must also be considered for this analysis -- the underlying distribution over which the explanation is obtained. Considering the underlying distribution is key in avoiding explanations that are socially misaligned, i.e., convey information that is biased and unhelpful to users. We demonstrate the significant influence of the underlying distribution on the resulting overall interpretation complexity, in two settings: (i) prediction models paired with an external out-of-distribution (OOD) detector; and (ii) prediction models designed to inherently generate socially aligned explanations. Our findings prove that the expressiveness of the distribution can significantly influence the overall complexity of interpretation, and identify essential prerequisites that a model must possess to generate socially aligned explanations.


Regularized Multi-Decoder Ensemble for an Error-Aware Scene Representation Network

arXiv.org Artificial Intelligence

Feature grid Scene Representation Networks (SRNs) have been applied to scientific data as compact functional surrogates for analysis and visualization. As SRNs are black-box lossy data representations, assessing the prediction quality is critical for scientific visualization applications to ensure that scientists can trust the information being visualized. Currently, existing architectures do not support inference time reconstruction quality assessment, as coordinate-level errors cannot be evaluated in the absence of ground truth data. We propose a parameter-efficient multi-decoder SRN (MDSRN) ensemble architecture consisting of a shared feature grid with multiple lightweight multi-layer perceptron decoders. MDSRN can generate a set of plausible predictions for a given input coordinate to compute the mean as the prediction of the multi-decoder ensemble and the variance as a confidence score. The coordinate-level variance can be rendered along with the data to inform the reconstruction quality, or be integrated into uncertainty-aware volume visualization algorithms. To prevent the misalignment between the quantified variance and the prediction quality, we propose a novel variance regularization loss for ensemble learning that promotes the Regularized multi-decoder SRN (RMDSRN) to obtain a more reliable variance that correlates closely to the true model error. We comprehensively evaluate the quality of variance quantification and data reconstruction of Monte Carlo Dropout, Mean Field Variational Inference, Deep Ensemble, and Predicting Variance compared to the proposed MDSRN and RMDSRN across diverse scalar field datasets. We demonstrate that RMDSRN attains the most accurate data reconstruction and competitive variance-error correlation among uncertain SRNs under the same neural network parameter budgets.


KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example

arXiv.org Artificial Intelligence

Recently, Kolmogorov-Arnold Networks (KANs) have been proposed as an alternative to multilayer perceptrons, suggesting advantages in performance and interpretability. We study a typical binary event classification task in high-energy physics including high-level features and comment on the performance and interpretability of KANs in this context. We find that the learned activation functions of a one-layer KAN resemble the log-likelihood ratio of the input features. In deeper KANs, the activations in the first KAN layer differ from those in the one-layer KAN, which indicates that the deeper KANs learn more complex representations of the data. We study KANs with different depths and widths and we compare them to multilayer perceptrons in terms of performance and number of trainable parameters. For the chosen classification task, we do not find that KANs are more parameter efficient. However, small KANs may offer advantages in terms of interpretability that come at the cost of only a moderate loss in performance.


KAN based Autoencoders for Factor Models

arXiv.org Artificial Intelligence

Inspired by recent advances in Kolmogorov-Arnold Networks (KANs), we introduce a novel approach to latent factor conditional asset pricing models. While previous machine learning applications in asset pricing have predominantly used Multilayer Perceptrons with ReLU activation functions to model latent factor exposures, our method introduces a KAN-based autoencoder which surpasses MLP models in both accuracy and interpretability. Our model offers enhanced flexibility in approximating exposures as nonlinear functions of asset characteristics, while simultaneously providing users with an intuitive framework for interpreting latent factors. Empirical backtesting demonstrates our model's superior ability to explain cross-sectional risk exposures. Moreover, long-short portfolios constructed using our model's predictions achieve higher Sharpe ratios, highlighting its practical value in investment management.


A Tiny Supervised ODL Core with Auto Data Pruning for Human Activity Recognition

arXiv.org Artificial Intelligence

In this paper, we introduce a low-cost and low-power tiny supervised on-device learning (ODL) core that can address the distributional shift of input data for human activity recognition. Although ODL for resource-limited edge devices has been studied recently, how exactly to provide the training labels to these devices at runtime remains an open-issue. To address this problem, we propose to combine an automatic data pruning with supervised ODL to reduce the number queries needed to acquire predicted labels from a nearby teacher device and thus save power consumption during model retraining. The data pruning threshold is automatically tuned, eliminating a manual threshold tuning. As a tinyML solution at a few mW for the human activity recognition, we design a supervised ODL core that supports our automatic data pruning using a 45nm CMOS process technology. We show that the required memory size for the core is smaller than the same-shaped multilayer perceptron (MLP) and the power consumption is only 3.39mW. Experiments using a human activity recognition dataset show that the proposed automatic data pruning reduces the communication volume by 55.7% and power consumption accordingly with only 0.9% accuracy loss.


3DPX: Progressive 2D-to-3D Oral Image Reconstruction with Hybrid MLP-CNN Networks

arXiv.org Artificial Intelligence

Panoramic X-ray (PX) is a prevalent modality in dental practice for its wide availability and low cost. However, as a 2D projection image, PX does not contain 3D anatomical information, and therefore has limited use in dental applications that can benefit from 3D information, e.g., tooth angular misa-lignment detection and classification. Reconstructing 3D structures directly from 2D PX has recently been explored to address limitations with existing methods primarily reliant on Convolutional Neural Networks (CNNs) for direct 2D-to-3D mapping. These methods, however, are unable to correctly infer depth-axis spatial information. In addition, they are limited by the in-trinsic locality of convolution operations, as the convolution kernels only capture the information of immediate neighborhood pixels. In this study, we propose a progressive hybrid Multilayer Perceptron (MLP)-CNN pyra-mid network (3DPX) for 2D-to-3D oral PX reconstruction. We introduce a progressive reconstruction strategy, where 3D images are progressively re-constructed in the 3DPX with guidance imposed on the intermediate recon-struction result at each pyramid level. Further, motivated by the recent ad-vancement of MLPs that show promise in capturing fine-grained long-range dependency, our 3DPX integrates MLPs and CNNs to improve the semantic understanding during reconstruction. Extensive experiments on two large datasets involving 464 studies demonstrate that our 3DPX outperforms state-of-the-art 2D-to-3D oral reconstruction methods, including standalone MLP and transformers, in reconstruction quality, and also im-proves the performance of downstream angular misalignment classification tasks.