e2e-varnet 0
Supplementary Material for HUMUS Net Hybrid Unrolled Multi Scale Network Architecture for Accelerated Net baseline details
Our default model has 3RSTB-D downsampling blocks, 2RSTB-B bottleneck blocks and 3RSTB-U upsampling blocks with 3 6 12 attention heads in the D/U blocks and 24 attention heads in the bottleneck block. For Swin Transformers layers, the window size is 8 for all methods and MLP ratio (hidden_dim/input_dim) of 2 is used. Each RSTB block consists of 2 STLs with embedding dimension of 66. For HUMUS-Net-L, we increase the embedding dimension to 96. We use 8cascades of unrolling with a U-Net as sensitivity map estimator (same as in E2E-VarNet) with 16channels.
Net Hybrid UnrolledMulti Scale
The number of cascades in unrolled networks has a fundamental impact on their performance. The results are summarized inTable 3. Weobservethat ASR boosts the reconstruction quality of E2E-VarNet. Traditional Transformers for NLP receive a sequence of 1D token embeddings. The input to the Transformer encoder is thisN D representation, which we also refer to in the paperastokenrepresentation, aseachrowintherepresentation corresponds toatoken(inourcase animagepatch)intheoriginalinput.