Convolutions Through the Lens of Tensor Networks

Dangel, Felix

arXiv.org Artificial Intelligence 

Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the generalization of theoretical and algorithmic ideas. We provide a new perspective onto convolutions through tensor networks (TNs) which allow reasoning about the underlying tensor multiplications by drawing diagrams, and manipulating them to perform function transformations, sub-tensor access, and fusion. We demonstrate this expressive power by deriving the diagrams of various autodiff operations and popular approximations of second-order information with full hyper-parameter support, batching, channel groups, and generalization to arbitrary convolution dimensions. Further, we provide convolution-specific transformations based on the connectivity pattern which allow to re-wire and simplify diagrams before evaluation. Finally, we probe computational performance, relying on established machinery for efficient TN contraction. Our TN implementation speeds up a recently-proposed KFAC variant up to 4.5 x and enables new hardware-efficient tensor dropout for approximate backpropagation. Despite the success of transformers [68], CNNs continue to be widely used and show competitive performance when incorporating architecture modernizations [41; 40] and attention [30; 70; 11; 39]. While the intuition behind convolution is simple to understand with graphical illustrations such as in Dumoulin & Visin [20], convolutions are more challenging to analyze than fully-connected layers in multi-layer perceptrons (MLPs). One reason is that the operation is hard to express in matrix expressions and--even when switching to index notation--compact expressions that are convenient to work with only exist for special hyper-parameter choices [e.g. The many hyper-parameters of convolution and additional features like channel groups [35] introduce additional complexity. And related objects like (higher-order) derivatives and related routines for autodiff inherit this complexity. TNs express tensor multiplications as diagrams (Figure 1).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found