Goto

Collaborating Authors

 matrix operation




What if Neural Networks had SVDs?

Neural Information Processing Systems

Various Neural Networks employ time-consuming matrix operations like matrix inversion. Many such matrix operations are faster to compute given the Singular Value Decomposition (SVD). Techniques from (Zhang et al., 2018; Mhammedi et al., 2017) allow using the SVD in Neural Networks without computing it. In theory, the techniques can speed up matrix operations, however, in practice, they are not fast enough. We present an algorithm that is fast enough to speed up several matrix operations. The algorithm increases the degree of parallelism of an underlying matrix multiplication H*X where H is an orthogonal matrix represented by a product of Householder matrices.


Batch Matrix-form Equations and Implementation of Multilayer Perceptrons

Wesselink, Wieger, Grooten, Bram, van de Wetering, Huub, Xiao, Qiao, Mocanu, Decebal Constantin

arXiv.org Artificial Intelligence

Multilayer perceptrons (MLPs) remain fundamental to modern deep learning, yet their algorithmic details are rarely presented in complete, explicit \emph{batch matrix-form}. Rather, most references express gradients per sample or rely on automatic differentiation. Although automatic differentiation can achieve equally high computational efficiency, the usage of batch matrix-form makes the computational structure explicit, which is essential for transparent, systematic analysis, and optimization in settings such as sparse neural networks. This paper fills that gap by providing a mathematically rigorous and implementation-ready specification of MLPs in batch matrix-form. We derive forward and backward equations for all standard and advanced layers, including batch normalization and softmax, and validate all equations using the symbolic mathematics library SymPy. From these specifications, we construct uniform reference implementations in NumPy, PyTorch, JAX, TensorFlow, and a high-performance C++ backend optimized for sparse operations. Our main contributions are: (1) a complete derivation of batch matrix-form backpropagation for MLPs, (2) symbolic validation of all gradient equations, (3) uniform Python and C++ reference implementations grounded in a small set of matrix primitives, and (4) demonstration of how explicit formulations enable efficient sparse computation. Together, these results establish a validated, extensible foundation for understanding, teaching, and researching neural network algorithms.


Accelerating HDC-CNN Hybrid Models Using Custom Instructions on RISC-V GPUs

Matsumi, Wakuto, Mian, Riaz-Ul-Haque

arXiv.org Artificial Intelligence

Machine learning based on neural networks has advanced rapidly, but the high energy consumption required for training and inference remains a major challenge. Hyperdimensional Computing (HDC) offers a lightweight, brain-inspired alternative that enables high parallelism but often suffers from lower accuracy on complex visual tasks. To overcome this, hybrid accelerators combining HDC and Convolutional Neural Networks (CNNs) have been proposed, though their adoption is limited by poor generalizability and programmability. The rise of open-source RISC-V architectures has created new opportunities for domain-specific GPU design. Unlike traditional proprietary GPUs, emerging RISC-V-based GPUs provide flexible, programmable platforms suitable for custom computation models such as HDC. In this study, we design and implement custom GPU instructions optimized for HDC operations, enabling efficient processing for hybrid HDC-CNN workloads. Experimental results using four types of custom HDC instructions show a performance improvement of up to 56.2 times in microbenchmark tests, demonstrating the potential of RISC-V GPUs for energy-efficient, high-performance computing.


d61e4bbd6393c9111e6526ea173a7c8b-Supplemental.pdf

Neural Information Processing Systems

In theory, the techniques can speed up matrix operations, however, in practice, they are not fast enough.



A Path to Universal Neural Cellular Automata

Béna, Gabriel, Faldor, Maxence, Goodman, Dan F. M., Cully, Antoine

arXiv.org Artificial Intelligence

Cellular automata have long been celebrated for their ability to generate complex behaviors from simple, local rules, with well-known discrete models like Conway's Game of Life proven capable of universal computation. Recent advancements have extended cellular automata into continuous domains, raising the question of whether these systems retain the capacity for universal computation. In parallel, neural cellular automata have emerged as a powerful paradigm where rules are learned via gradient descent rather than manually designed. This work explores the potential of neural cellular automata to develop a continuous Universal Cellular Automaton through training by gradient descent. We introduce a cellular automaton model, objective functions and training strategies to guide neural cellular automata toward universal computation in a continuous setting. Our experiments demonstrate the successful training of fundamental computational primitives - such as matrix multiplication and transposition - culminating in the emulation of a neural network solving the MNIST digit classification task directly within the cellular automata state. These results represent a foundational step toward realizing analog general-purpose computers, with implications for understanding universal computation in continuous dynamics and advancing the automated discovery of complex cellular automata behaviors via machine learning.


Fast Mutual Information Computation for Large Binary Datasets

Falcao, Andre O.

arXiv.org Artificial Intelligence

Mutual Information (MI) is a powerful statistical measure that quantifies shared information between random variables, particularly valuable in high-dimensional data analysis across fields like genomics, natural language processing, and network science. However, computing MI becomes computationally prohibitive for large datasets where it is typically required a pairwise computational approach where each column is compared to others. This work introduces a matrix-based algorithm that accelerates MI computation by leveraging vectorized operations and optimized matrix calculations. By transforming traditional pairwise computational approaches into bulk matrix operations, the proposed method enables efficient MI calculation across all variable pairs. Experimental results demonstrate significant performance improvements, with computation times reduced up to 50,000 times in the largest dataset using optimized implementations, particularly when utilizing hardware optimized frameworks. The approach promises to expand MI's applicability in data-driven research by overcoming previous computational limitations.


What if Neural Networks had SVDs?

Neural Information Processing Systems

Various Neural Networks employ time-consuming matrix operations like matrix inversion. Many such matrix operations are faster to compute given the Singular Value Decomposition (SVD). Techniques from (Zhang et al., 2018; Mhammedi et al., 2017) allow using the SVD in Neural Networks without computing it. In theory, the techniques can speed up matrix operations, however, in practice, they are not fast enough. We present an algorithm that is fast enough to speed up several matrix operations.