sm3
SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization
We propose SMMF (Square-Matricized Momentum Factorization), a memory-efficient optimizer that reduces the memory requirement of the widely used adaptive learning rate optimizers, such as Adam, by up to 96%. SMMF enables flexible and efficient factorization of an arbitrary rank (shape) of the first and second momentum tensors during optimization, based on the proposed square-matricization and one-time single matrix factorization. From this, it becomes effectively applicable to any rank (shape) of momentum tensors, i.e., bias, matrix, and any rank-d tensors, prevalent in various deep model architectures, such as CNNs (high rank) and Transformers (low rank), in contrast to existing memory-efficient optimizers that applies only to a particular (rank-2) momentum tensor, e.g., linear layers. We conduct a regret bound analysis of SMMF, which shows that it converges similarly to non-memory-efficient adaptive learning rate optimizers, such as AdamNC, providing a theoretical basis for its competitive optimization capability. In our experiment, SMMF takes up to 96% less memory compared to state-of-the-art memory efficient optimizers, e.g., Adafactor, CAME, and SM3, while achieving comparable model performance on various CNN and Transformer tasks.
Deep Learning-based Detection of Bacterial Swarm Motion Using a Single Image
Li, Yuzhu, Li, Hao, Chen, Weijie, O'Riordan, Keelan, Mani, Neha, Qi, Yuxuan, Liu, Tairan, Mani, Sridhar, Ozcan, Aydogan
Distinguishing between swarming and swimming, the two principal forms of bacterial movement, holds significant conceptual and clinical relevance. This is because bacteria that exhibit swarming capabilities often possess unique properties crucial to the pathogenesis of infectious diseases and may also have therapeutic potential. Here, we report a deep learning-based swarming classifier that rapidly and autonomously predicts swarming probability using a single blurry image. Compared with traditional video-based, manually-processed approaches, our method is particularly suited for high-throughput environments and provides objective, quantitative assessments of swarming probability. The swarming classifier demonstrated in our work was trained on Enterobacter sp. SM3 and showed good performance when blindly tested on new swarming (positive) and swimming (negative) test images of SM3, achieving a sensitivity of 97.44% and a specificity of 100%. Furthermore, this classifier demonstrated robust external generalization capabilities when applied to unseen bacterial species, such as Serratia marcescens DB10 and Citrobacter koseri H6. It blindly achieved a sensitivity of 97.92% and a specificity of 96.77% for DB10, and a sensitivity of 100% and a specificity of 97.22% for H6. This competitive performance indicates the potential to adapt our approach for diagnostic applications through portable devices or even smartphones. This adaptation would facilitate rapid, objective, on-site screening for bacterial swarming motility, potentially enhancing the early detection and treatment assessment of various diseases, including inflammatory bowel diseases (IBD) and urinary tract infections (UTI).
Monte Carlo is a good sampling strategy for polynomial approximation in high dimensions
Adcock, Ben, Brugiapaglia, Simone
This paper concerns the approximation of smooth, high-dimensional functions from limited samples using polynomials. This task lies at the heart of many applications in computational science and engineering - notably, some of those arising from parametric modelling and computational uncertainty quantification. It is common to use Monte Carlo sampling in such applications, so as not to succumb to the curse of dimensionality. However, it is well known that such a strategy is theoretically suboptimal. Specifically, there are many polynomial spaces of dimension $n$ for which the sample complexity scales log-quadratically, i.e., like $c \cdot n^2 \cdot \log(n)$ as $n \rightarrow \infty$. This well-documented phenomenon has led to a concerted effort over the last decade to design improved, and moreover, near-optimal strategies, whose sample complexities scale log-linearly, or even linearly in $n$. In this work we demonstrate that Monte Carlo is actually a perfectly good strategy in high dimensions, despite its apparent suboptimality. We first document this phenomenon empirically via a systematic set of numerical experiments. Next, we present a theoretical analysis that rigorously justifies this fact in the case of holomorphic functions of infinitely-many variables. We show that there is a least-squares approximation based on $m$ Monte Carlo samples whose error decays algebraically fast in $m/\log(m)$, with a rate that is the same as that of the best $n$-term polynomial approximation. This result is non-constructive, since it assumes knowledge of a suitable polynomial subspace in which to perform the approximation. We next present a compressed sensing-based scheme that achieves the same rate, except for a larger polylogarithmic factor. This scheme is practical, and numerically it performs as well as or better than well-known adaptive least-squares schemes.
Memory-Efficient Adaptive Optimization for Large-Scale Learning
Anil, Rohan, Gupta, Vineet, Koren, Tomer, Singer, Yoram
Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. We describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of per-parameter adaptivity while allowing for larger models and mini-batches. We give convergence guarantees for our method and demonstrate its effectiveness in training very large deep models.