Ray-Tracing for Conditionally Activated Neural Networks

Gallicchio, Claudio, Nuti, Giuseppe

arXiv.org Artificial Intelligence 

A BSTRACT In this paper, we introduce a novel architecture for conditionally activated neural networks combining a hierarchical construction of multiple Mixture of Experts (MoEs) layers with a sampling mechanism that progressively converges to an optimized configuration of expert activation. This methodology enables the dynamic unfolding of the network's architecture, facilitating efficient path-specific training. Experimental results demonstrate that this approach achieves competitive accuracy compared to conventional baselines while significantly reducing the parameter count required for inference. The approach we propose implements a neural network where blocks (experts) are stacked over multiple layers. By expressing each block's output as the expected firing rate of a stochastic calculation path, we can simultaneously solve the inference and the selective activation problems. Importantly, since we model every block's output to be its expected activation rate, initiating a computational path from the input nodes or from within a block in the middle of the network will yield comparable results, allowing for a variety of new computational approaches, balancing the width-versus depth-first paradigm.