Goto

Collaborating Authors

 pad-net


SupplementaryMaterialsforM3ViT: Mixture-of-ExpertsVision TransformerforEfficientMulti-taskLearning withModel-AcceleratorCo-design

Neural Information Processing Systems

The final ViT block'soutput feature will be fed into decoders for multi-task predictions. Eachdecoder contains five conv layers (the first four of dimension 256 and the final one of dimension corresponding to taskprediction) andfourupsampling layers. Compared toSoTAencoder-focused workCross-Stitch, although M3ViTperforms slightly lower onNYUD-v2 with twotasks, itachievesbetter performance onalltheother settings. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs.


DenseMTL: Cross-task Attention Mechanism for Dense Multi-task Learning

Lopes, Ivan, Vu, Tuan-Hung, de Charette, Raoul

arXiv.org Artificial Intelligence

Multi-task learning has recently emerged as a promising solution for a comprehensive understanding of complex scenes. In addition to being memory-efficient, multi-task models, when appropriately designed, can facilitate the exchange of complementary signals across tasks. In this work, we jointly address 2D semantic segmentation and three geometry-related tasks: dense depth estimation, surface normal estimation, and edge estimation, demonstrating their benefits on both indoor and outdoor datasets. We propose a novel multi-task learning architecture that leverages pairwise cross-task exchange through correlation-guided attention and self-attention to enhance the overall representation learning for all tasks. We conduct extensive experiments across three multi-task setups, showing the advantages of our approach compared to competitive baselines in both synthetic and real-world benchmarks. Additionally, we extend our method to the novel multi-task unsupervised domain adaptation setting. Our code is available at https://github.com/cv-rits/DenseMTL


PAD-Net: An Efficient Framework for Dynamic Networks

He, Shwai, Ding, Liang, Dong, Daize, Liu, Boan, Yu, Fuqiang, Tao, Dacheng

arXiv.org Artificial Intelligence

Dynamic networks, e.g., Dynamic Convolution (DY-Conv) and the Mixture of Experts (MoE), have been extensively explored as they can considerably improve the model's representation power with acceptable computational cost. The common practice in implementing dynamic networks is to convert the given static layers into fully dynamic ones where all parameters are dynamic (at least within a single layer) and vary with the input. However, such a fully dynamic setting may cause redundant parameters and high deployment costs, limiting the applicability of dynamic networks to a broader range of tasks and models. The main contributions of our work are challenging the basic commonsense in dynamic networks and proposing a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones. Also, we further design Iterative Mode Partition to partition dynamic and static parameters efficiently. Our method is comprehensively supported by large-scale experiments with two typical advanced dynamic architectures, i.e., DY-Conv and MoE, on both image classification and GLUE benchmarks. Encouragingly, we surpass the fully dynamic networks by $+0.7\%$ top-1 acc with only $30\%$ dynamic parameters for ResNet-50 and $+1.9\%$ average score in language understanding with only $50\%$ dynamic parameters for BERT. Code will be released at: \url{https://github.com/Shwai-He/PAD-Net}.