Goto

Collaborating Authors

 adaptive depth network



Adaptive Depth Networks with Skippable Sub-Paths

Neural Information Processing Systems

Predictable adaptation of network depths can be an effective way to control inference latency and meet the resource condition of various devices. However, previous adaptive depth networks do not provide general principles and a formal explanation on why and which layers can be skipped, and, hence, their approaches are hard to be generalized and require long and complex training steps. In this paper, we present a practical approach to adaptive depth networks that is applicable to various networks with minimal training effort. In our approach, every hierarchical residual stage is divided into two sub-paths, and they are trained to acquire different properties through a simple self-distillation strategy. While the first sub-path is essential for hierarchical feature learning, the second one is trained to refine the learned features and minimize performance degradation if it is skipped. Unlike prior adaptive networks, our approach does not train every target sub-network in an iterative manner. At test time, however, we can connect these sub-paths in a combinatorial manner to select sub-networks of various accuracy-efficiency trade-offs from a single network. We provide a formal rationale for why the proposed training method can reduce overall prediction errors while minimizing the impact of skipping sub-paths. We demonstrate the generality and effectiveness of our approach with convolutional neural networks and transformers.



Adaptive Depth Networks with Skippable Sub-Paths

Neural Information Processing Systems

Predictable adaptation of network depths can be an effective way to control inference latency and meet the resource condition of various devices. However, previous adaptive depth networks do not provide general principles and a formal explanation on why and which layers can be skipped, and, hence, their approaches are hard to be generalized and require long and complex training steps. In this paper, we present a practical approach to adaptive depth networks that is applicable to various networks with minimal training effort. In our approach, every hierarchical residual stage is divided into two sub-paths, and they are trained to acquire different properties through a simple self-distillation strategy. While the first sub-path is essential for hierarchical feature learning, the second one is trained to refine the learned features and minimize performance degradation if it is skipped.


Adaptive Depth Networks with Skippable Sub-Paths

arXiv.org Artificial Intelligence

Systematic adaptation of network depths at runtime can be an effective way to control inference latency and meet the resource condition of various devices. However, previous depth adaptive networks do not provide general principles and a formal explanation on why and which layers can be skipped, and, hence, their approaches are hard to be generalized and require long and complex training steps. In this paper, we present an architectural pattern and training method for adaptive depth networks that can provide flexible accuracy-efficiency trade-offs in a single network. In our approach, every residual stage is divided into 2 consecutive sub-paths with different properties. While the first sub-path is mandatory for hierarchical feature learning, the other is optimized to incur minimal performance degradation even if it is skipped. Unlike previous adaptive networks, our approach does not iteratively self-distill a fixed set of sub-networks, resulting in significantly shorter training time. However, once deployed on devices, it can instantly construct sub-networks of varying depths to provide various accuracy-efficiency trade-offs in a single model. We provide a formal rationale for why the proposed architectural pattern and training method can reduce overall prediction errors while minimizing the impact of skipping selected sub-paths. We also demonstrate the generality and effectiveness of our approach with various residual networks, both from convolutional neural networks and vision transformers.