Perceptrons
Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks
Robin, David A. R., Scaman, Kevin, Lelarge, Marc
We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows. Distinct from the fixed-space global optimality of non-convex optimization, this new form of convergence, and the techniques introduced to prove such convergence, pave the way for a usable deep learning convergence theory in the near future, without overparameterization assumptions relating the number of parameters and training samples. We define these architectures from a simple computation graph and a mechanism to lift it, thus increasing the number of parameters, generalizing the idea of increasing the widths of multi-layer perceptrons. We show that architectures similar to most common deep learning models are present in this class, obtained by sparsifying the weight tensors of usual architectures at initialization. Leveraging tools of algebraic topology and random graph theory, we use the computation graph's geometry to propagate properties guaranteeing convergence to any precision for these large sparse models.
Explainability in Neural Networks for Natural Language Processing Tasks
Mersha, Melkamu, Bitewa, Mingiziem, Abay, Tsion, Kalita, Jugal
Neural networks are widely regarded as black-box models, creating significant challenges in understanding their inner workings, especially in natural language processing (NLP) applications. To address this opacity, model explanation techniques like Local Interpretable Model-Agnostic Explanations (LIME) have emerged as essential tools for providing insights into the behavior of these complex systems. This study leverages LIME to interpret a multi-layer perceptron (MLP) neural network trained on a text classification task. By analyzing the contribution of individual features to model predictions, the LIME approach enhances interpretability and supports informed decision-making. Despite its effectiveness in offering localized explanations, LIME has limitations in capturing global patterns and feature interactions. This research highlights the strengths and shortcomings of LIME and proposes directions for future work to achieve more comprehensive interpretability in neural NLP models.
Optimizing Supply Chain Networks with the Power of Graph Neural Networks
Chen, Chi-Sheng, Chen, Ying-Jung
Graph Neural Networks (GNNs) have emerged as transformative tools for modeling complex relational data, offering unprecedented capabilities in tasks like forecasting and optimization. This study investigates the application of GNNs to demand forecasting within supply chain networks using the SupplyGraph dataset, a benchmark for graph-based supply chain analysis. By leveraging advanced GNN methodologies, we enhance the accuracy of forecasting models, uncover latent dependencies, and address temporal complexities inherent in supply chain operations. Comparative analyses demonstrate that GNN-based models significantly outperform traditional approaches, including Multilayer Perceptrons (MLPs) and Graph Convolutional Networks (GCNs), particularly in single-node demand forecasting tasks. The integration of graph representation learning with temporal data highlights GNNs' potential to revolutionize predictive capabilities for inventory management, production scheduling, and logistics optimization. This work underscores the pivotal role of forecasting in supply chain management and provides a robust framework for advancing research and applications in this domain.
Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
Transformers have achieved remarkable success across diverse domains, but their monolithic architecture presents challenges in interpretability, adaptability, and scalability. This paper introduces a novel modular Transformer architecture that explicitly decouples knowledge and reasoning through a generalized cross-attention mechanism to a globally shared knowledge base with layer-specific transformations, specifically designed for effective knowledge retrieval. Critically, we provide a rigorous mathematical derivation demonstrating that the Feed-Forward Network (FFN) in a standard Transformer is a specialized case (a closure) of this generalized cross-attention, revealing its role in implicit knowledge retrieval and validating our design. This theoretical framework provides a new lens for understanding FFNs and lays the foundation for future research exploring enhanced interpretability, adaptability, and scalability, enabling richer interplay with external knowledge bases and other systems.
Variational autoencoders with latent high-dimensional steady geometric flows for dynamics
We develop Riemannian approaches to variational autoencoders (VAEs) for PDE-type ambient data with regularizing geometric latent dynamics, which we refer to as VAE-DLM, or VAEs with dynamical latent manifolds. We redevelop the VAE framework such that manifold geometries, subject to our geometric flow, embedded in Euclidean space are learned in the intermediary latent space developed by encoders and decoders. By tailoring the geometric flow in which the latent space evolves, we induce latent geometric properties of our choosing, which are reflected in empirical performance. We reformulate the traditional evidence lower bound (ELBO) loss with a considerate choice of prior. We develop a linear geometric flow with a steady-state regularizing term. This flow requires only automatic differentiation of one time derivative, and can be solved in moderately high dimensions in a physics-informed approach, allowing more expressive latent representations. We discuss how this flow can be formulated as a gradient flow, and maintains entropy away from metric singularity. This, along with an eigenvalue penalization condition, helps ensure the manifold is sufficiently large in measure, nondegenerate, and a canonical geometry, which contribute to a robust representation. Our methods focus on the modified multi-layer perceptron architecture with tanh activations for the manifold encoder-decoder. We demonstrate, on our datasets of interest, our methods perform at least as well as the traditional VAE, and oftentimes better. Our methods can outperform this and a VAE endowed with our proposed architecture, frequently reducing out-of-distribution (OOD) error between 15% to 35% on select datasets. We highlight our method on ambient PDEs whose solutions maintain minimal variation in late times. We provide empirical justification towards how we can improve robust learning for external dynamics with VAEs.
Best Transition Matrix Esitimation or Best Label Noise Robustness Classifier? Two Possible Methods to Enhance the Performance of T-revision
Liu, Haixu, Tao, Zerui, Zhang, Naihui, Liu, Sixing
Label noise refers to incorrect labels in a dataset caused by human errors or collection defects, which is common in real-world applications and can significantly reduce the accuracy of models. This report explores how to estimate noise transition matrices and construct deep learning classifiers that are robust against label noise. In cases where the transition matrix is known, we apply forward correction and importance reweighting methods to correct the impact of label noise using the transition matrix. When the transition matrix is unknown or inaccurate, we use the anchor point assumption and T-Revision series methods to estimate or correct the noise matrix. In this study, we further improved the T-Revision method by developing T-Revision-Alpha and T-Revision-Softmax to enhance stability and robustness. Additionally, we designed and implemented two baseline classifiers, a Multi-Layer Perceptron (MLP) and ResNet-18, based on the cross-entropy loss function. We compared the performance of these methods on predicting clean labels and estimating transition matrices using the FashionMINIST dataset with known noise transition matrices. For the CIFAR-10 dataset, where the noise transition matrix is unknown, we estimated the noise matrix and evaluated the ability of the methods to predict clean labels.
Multi-Task Semantic Communication With Graph Attention-Based Feature Correlation Extraction
Yu, Xi, Lv, Tiejun, Li, Weicai, Ni, Wei, Niyato, Dusit, Hossain, Ekram
Multi-task semantic communication can serve multiple learning tasks using a shared encoder model. Existing models have overlooked the intricate relationships between features extracted during an encoding process of tasks. This paper presents a new graph attention inter-block (GAI) module to the encoder/transmitter of a multi-task semantic communication system, which enriches the features for multiple tasks by embedding the intermediate outputs of encoding in the features, compared to the existing techniques. The key idea is that we interpret the outputs of the intermediate feature extraction blocks of the encoder as the nodes of a graph to capture the correlations of the intermediate features. Another important aspect is that we refine the node representation using a graph attention mechanism to extract the correlations and a multi-layer perceptron network to associate the node representations with different tasks. Consequently, the intermediate features are weighted and embedded into the features transmitted for executing multiple tasks at the receiver. Experiments demonstrate that the proposed model surpasses the most competitive and publicly available models by 11.4% on the CityScapes 2Task dataset and outperforms the established state-of-the-art by 3.97% on the NYU V2 3Task dataset, respectively, when the bandwidth ratio of the communication channel (i.e., compression level for transmission over the channel) is as constrained as 1 12 .
Automatic Construction of Pattern Classifiers Capable of Continuous Incremental Learning and Unlearning Tasks Based on Compact-Sized Probabilistic Neural Network
Hoya, Tetsuya, Morita, Shunpei
This paper proposes a novel approach to pattern classification using a probabilistic neural network model. The strategy is based on a compact-sized probabilistic neural network capable of continuous incremental learning and unlearning tasks. The network is constructed/reconstructed using a simple, one-pass network-growing algorithm with no hyperparameter tuning. Then, given the training dataset, its structure and parameters are automatically determined and can be dynamically varied in continual incremental and decremental learning situations. The algorithm proposed in this work involves no iterative or arduous matrix-based parameter approximations but a simple data-driven updating scheme. Simulation results using nine publicly available databases demonstrate the effectiveness of this approach, showing that compact-sized probabilistic neural networks constructed have a much smaller number of hidden units compared to the original probabilistic neural network model and yet can achieve a similar classification performance to that of multilayer perceptron neural networks in standard classification tasks, while also exhibiting sufficient capability in continuous class incremental learning and unlearning tasks.
KAE: Kolmogorov-Arnold Auto-Encoder for Representation Learning
Yu, Fangchen, Hu, Ruilizhen, Lin, Yidong, Ma, Yuqi, Huang, Zhenghao, Li, Wenye
The Kolmogorov-Arnold Network (KAN) has recently gained attention as an alternative to traditional multi-layer perceptrons (MLPs), offering improved accuracy and interpretability by employing learnable activation functions on edges. In this paper, we introduce the Kolmogorov-Arnold Auto-Encoder (KAE), which integrates KAN with autoencoders (AEs) to enhance representation learning for retrieval, classification, and denoising tasks. Leveraging the flexible polynomial functions in KAN layers, KAE captures complex data patterns and non-linear relationships. Experiments on benchmark datasets demonstrate that KAE improves latent representation quality, reduces reconstruction errors, and achieves superior performance in downstream tasks such as retrieval, classification, and denoising, compared to standard autoencoders and other KAN variants. These results suggest KAE's potential as a useful tool for representation learning. Our code is available at \url{https://github.com/SciYu/KAE/}.
Deep ReLU networks -- injectivity capacity upper bounds
We study deep ReLU feed forward neural networks (NN) and their injectivity abilities. The main focus is on \emph{precisely} determining the so-called injectivity capacity. For any given hidden layers architecture, it is defined as the minimal ratio between number of network's outputs and inputs which ensures unique recoverability of the input from a realizable output. A strong recent progress in precisely studying single ReLU layer injectivity properties is here moved to a deep network level. In particular, we develop a program that connects deep $l$-layer net injectivity to an $l$-extension of the $\ell_0$ spherical perceptrons, thereby massively generalizing an isomorphism between studying single layer injectivity and the capacity of the so-called (1-extension) $\ell_0$ spherical perceptrons discussed in [82]. \emph{Random duality theory} (RDT) based machinery is then created and utilized to statistically handle properties of the extended $\ell_0$ spherical perceptrons and implicitly of the deep ReLU NNs. A sizeable set of numerical evaluations is conducted as well to put the entire RDT machinery in practical use. From these we observe a rapidly decreasing tendency in needed layers' expansions, i.e., we observe a rapid \emph{expansion saturation effect}. Only $4$ layers of depth are sufficient to closely approach level of no needed expansion -- a result that fairly closely resembles observations made in practical experiments and that has so far remained completely untouchable by any of the existing mathematical methodologies.