Goto

Collaborating Authors

 dlgn


Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks

arXiv.org Artificial Intelligence

Modern neural networks demonstrate state-of-the-art performance on numerous existing benchmarks; however, their high computational requirements and energy consumption prompt researchers to seek more efficient solutions for real-world deployment. Logic gate networks (LGNs) learns a large network of logic gates for efficient image classification. However, learning a network that can solve a simple problem like CIFAR-10 can take days to weeks to train. Even then, almost half of the network remains unused, causing a discretization gap. This discretization gap hinders real-world deployment of LGNs, as the performance drop between training and inference negatively impacts accuracy. We inject Gumbel noise with a straight-through estimator during training to significantly speed up training, improve neuron utilization, and decrease the discretization gap. We theoretically show that this results from implicit Hessian regularization, which improves the convergence properties of LGNs. We train networks $4.5 \times$ faster in wall-clock time, reduce the discretization gap by $98\%$, and reduce the number of unused gates by $100\%$.


Light Differentiable Logic Gate Networks

arXiv.org Artificial Intelligence

Differentiable logic gate networks (DLGNs) exhibit extraordinary efficiency at inference while sustaining competitive accuracy. But vanishing gradients, discretization errors, and high training cost impede scaling these networks. Even with dedicated parameter initialization schemes from subsequent works, increasing depth still harms accuracy. We show that the root cause of these issues lies in the underlying parametrization of logic gate neurons themselves. To overcome this issue, we propose a reparametrization that also shrinks the parameter size logarithmically in the number of inputs per gate. For binary inputs, this already reduces the model size by 4x, speeds up the backward pass by up to 1.86x, and converges in 8.5x fewer training steps. On top of that, we show that the accuracy on CIFAR-100 remains stable and sometimes superior to the original parametrization.


From MNIST to ImageNet: Understanding the Scalability Boundaries of Differentiable Logic Gate Networks

arXiv.org Artificial Intelligence

Differentiable Logic Gate Networks (DLGNs) are a very fast and energy-efficient alternative to conventional feed-forward networks. With learnable combinations of logical gates, DLGNs enable fast inference by hardware-friendly execution. Since the concept of DLGNs has only recently gained attention, these networks are still in their developmental infancy, including the design and scalability of their output layer. To date, this architecture has primarily been tested on datasets with up to ten classes. This work examines the behavior of DLGNs on large multi-class datasets. We investigate its general expressiveness, its scalability, and evaluate alternative output strategies. Using both synthetic and real-world datasets, we provide key insights into the importance of temperature tuning and its impact on output layer performance. We evaluate conditions under which the Group-Sum layer performs well and how it can be applied to large-scale classification of up to 2000 classes. Figure 1: DLGNs (blue) consistently outperform MLPs (red) across classification tasks with up to 2000 classes. The result illustrates the potential of logic-gate-based architectures to remain effective when applied to large-scale classification problems. Deep artificial neural networks have improved immensely in the last few years, exhibiting impressive performance across a wide range of tasks (Golroudbari & Sabour, 2023; Noor & Ige, 2024; Ekun-dayo & Ezugwu, 2025). However, these improvements come with rapidly growing computational costs (Thompson et al., 2020; Rosenfeld, 2021; Tripp et al., 2024). This constrains their deployment in many real-world environments, particularly on edge devices and mobile phones (Zhang et al., 2020; Zheng, 2025).


Half-Space Feature Learning in Neural Networks

arXiv.org Artificial Intelligence

There currently exist two extreme viewpoints for neural network feature learning -- (i) Neural networks simply implement a kernel method (a la NTK) and hence no features are learned (ii) Neural networks can represent (and hence learn) intricate hierarchical features suitable for the data. We argue in this paper neither interpretation is likely to be correct based on a novel viewpoint. Neural networks can be viewed as a mixture of experts, where each expert corresponds to a (number of layers length) path through a sequence of hidden units. We use this alternate interpretation to motivate a model, called the Deep Linearly Gated Network (DLGN), which sits midway between deep linear networks and ReLU networks. Unlike deep linear networks, the DLGN is capable of learning non-linear features (which are then linearly combined), and unlike ReLU networks these features are ultimately simple -- each feature is effectively an indicator function for a region compactly described as an intersection of (number of layers) half-spaces in the input space. This viewpoint allows for a comprehensive global visualization of features, unlike the local visualizations for neurons based on saliency/activation/gradient maps. Feature learning in DLGNs is shown to happen and the mechanism with which this happens is through learning half-spaces in the input space that contain smooth regions of the target function. Due to the structure of DLGNs, the neurons in later layers are fundamentally the same as those in earlier layers -- they all represent a half-space -- however, the dynamics of gradient descent impart a distinct clustering to the later layer neurons. We hypothesize that ReLU networks also have similar feature learning behaviour.