Goto

Collaborating Authors

 optimal depth


Optimally Deep Networks -- Adapting Model Depth to Datasets for Superior Efficiency

Tareen, Shaharyar Ahmed Khan, Tareen, Filza Khan

arXiv.org Artificial Intelligence

Deep neural networks (DNNs) have provided brilliant performance across various tasks. However, this success often comes at the cost of unnecessarily large model sizes, high computational demands, and substantial memory footprints. Typically, powerful architectures are trained at full depths but not all datasets or tasks require such high model capacity. Training big and deep architectures on relatively low-complexity datasets frequently leads to wasted computation, unnecessary energy consumption, and excessive memory usage, which in turn makes deployment of models on resource-constrained devices impractical. To address this problem, we introduce the concept of Optimally Deep Networks (ODNs), which provides a balance between model depth and task complexity. Specifically, we propose a NAS like training strategy called progressive depth expansion, which begins by training neural networks at shallower depths and incrementally increases their depth as the earlier blocks converge, continuing this process until the target accuracy is reached. ODNs use only the optimal depth for the tasks at hand, removing redundant layers. This cuts down future training and inference costs, lowers the model memory footprint, enhances computational efficiency, and facilitates deployment on edge devices. Empirical results show that the optimal depths of ResNet-18 and ResNet-34 for MNIST and SVHN, achieve up to 98.64 % and 96.44 % reduction in memory footprint, while maintaining a competitive accuracy of 99.31 % and 96.08 %, respectively.


Peeling Context from Cause for Multimodal Molecular Property Prediction

Li, Tao, Hou, Kaiyuan, Vinh, Tuan, Yang, Carl, Raj, Monika

arXiv.org Artificial Intelligence

Deep models are used for molecular property prediction, yet they are often hard to interpret and may rely on spurious context rather than causal structure, which degrades reliability under distribution shift and harms predictive performance. We introduce CLaP, Causal Layerwise Peeling, a framework which separates causal signal from context in a layerwise manner and integrates diverse graph representations of molecules. At each layer, a causal block performs a soft split into causal and trivial branches, fuses causal evidence across modalities, and progressively peels batch-coupled context to concentrate on label-relevant structure, thereby limiting shortcut signals and stabilizing layerwise refinement. We also obtain atom-level causal saliency maps that highlight substructures responsible for a prediction, providing actionable guidance for targeted molecular edits. Case studies confirm the accuracy of these maps and their alignment with chemical intuition. By peeling context from cause at every layer, the model delivers predictors that are accurate and interpretable for molecular design. Designing molecules with desired properties is a central goal in drug discovery and materials design (Sanchez-Lengeling & Aspuru-Guzik, 2018). Graph-based deep learning is effective for property prediction (Wu et al., 2018; Hinton et al., 2006; Bengio & LeCun, 2007; Goodfellow et al., 2016). However, models often exploit spurious correlations tied to datasets or batches (Geirhos et al., 2020), which hurts reliability under distribution shift.


Optimal Depth of Neural Networks

Qi, Qian

arXiv.org Artificial Intelligence

Determining the optimal depth of a neural network is a fundamental yet challenging problem, typically resolved through resource-intensive experimentation. This paper introduces a formal theoretical framework to address this question by recasting the forward pass of a deep network, specifically a Residual Network (ResNet), as an optimal stopping problem. We model the layer-by-layer evolution of hidden representations as a sequential decision process where, at each layer, a choice is made between halting computation to make a prediction or continuing to a deeper layer for a potentially more refined representation. This formulation captures the intrinsic trade-off between accuracy and computational cost. Our primary theoretical contribution is a proof that, under a plausible condition of diminishing returns on the residual functions, the expected optimal stopping depth is provably finite, even in an infinite-horizon setting. We leverage this insight to propose a novel and practical regularization term, $\mathcal{L}_{\rm depth}$, that encourages the network to learn representations amenable to efficient, early exiting. We demonstrate the generality of our framework by extending it to the Transformer architecture and exploring its connection to continuous-depth models via free-boundary problems. Empirical validation on ImageNet confirms that our regularizer successfully induces the theoretically predicted behavior, leading to significant gains in computational efficiency without compromising, and in some cases improving, final model accuracy.