Goto

Collaborating Authors

 blk


The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Otsuka, Hikari, Chijiwa, Daiki, Okoshi, Yasuyuki, Fujiki, Daichi, Takeuchi, Susumu, Motomura, Masato

arXiv.org Artificial Intelligence

The strong lottery ticket hypothesis (SLTH) conjectures that high-performing subnetworks, called strong lottery tickets (SLTs), are hidden in randomly initialized neural networks. Although recent theoretical studies have established the SLTH across various neural architectures, the SLTH for transformer architectures still lacks theoretical understanding. In particular, the current theory of the SLTH does not yet account for the multi-head attention (MHA) mechanism, a core component of transformers. To address this gap, we introduce a theoretical analysis of the existence of SLTs within MHAs. We prove that, if a randomly initialized MHA of $H$ heads and input dimension $d$ has the hidden dimension $O(d\log(Hd^{3/2}))$ for the key and value, it contains an SLT that approximates an arbitrary MHA with the same input dimension with high probability. Furthermore, by leveraging this theory for MHAs, we extend the SLTH to transformers without normalization layers. We empirically validate our theoretical findings, demonstrating that the approximation error between the SLT within a source model (MHA and transformer) and an approximate target counterpart decreases exponentially by increasing the hidden dimension of the source model.


Zero-Knowledge Proofs in Sublinear Space

Nye, Logan

arXiv.org Artificial Intelligence

Zero-knowledge proofs allow verification of computations without revealing private information. However, existing systems require memory proportional to the computation size, which has historically limited use in large-scale applications and on mobile and edge devices. We solve this fundamental bottleneck by developing, to our knowledge, the first proof system with sublinear memory requirements for mainstream cryptographic constructions. Our approach processes computations in blocks using a space-efficient tree algorithm, reducing memory from linear scaling to square-root scaling--from $Θ(T)$ to $O(\sqrt{T} + \log T \log\log T)$ for computation size $T$--while maintaining the same proof generation time through a constant number of streaming passes. For widely-used linear polynomial commitment schemes (KZG/IPA), our method produces identical proofs and verification when using the same parameters and hashing only aggregate commitments into the challenge generation, preserving proof size and security. Hash-based systems also achieve square-root memory scaling though with slightly different proof structures. This advance enables zero-knowledge proofs on everyday devices and makes previously infeasible large computations verifiable, fundamentally democratizing access to privacy-preserving computation. Space-efficient zero knowledge proof systems create opportunities to reshape how trust is established in digital systems--from enabling widespread participation in decentralized networks to making verifiable scientific computing practical at unprecedented scales.


Efficient Sparse Training with Structured Dropout

Lo, Andy

arXiv.org Artificial Intelligence

Dropout is a common regularisation technique in deep learning that improves generalisation. Even though it introduces sparsity and thus potential for higher throughput, it usually cannot bring speed-ups on GPUs due to its unstructured nature. In this project, I experiment with SparseDrop, a structured, hardware-friendly variant of dropout that can exploit such sparsity. I provide a CUDA implementation of SparseDrop, achieving speed-ups against its dense counterpart even at low sparsity levels. The empirical results demonstrate that SparseDrop provides similar, or sometimes even better, regularisation properties as standard dropout. This suggests its potential as a drop-in replacement to standard dropout with faster training speeds. The source code is available at https://github.com/andylolu2/sparse-dropout


MoRe Fine-Tuning with 10x Fewer Parameters

Tan, Wenxuan, Roberts, Nicholas, Huang, Tzu-Heng, Zhao, Jitian, Cooper, John, Guo, Samuel, Duan, Chengyu, Sala, Frederic

arXiv.org Artificial Intelligence

Parameter-efficient fine-tuning (PEFT) techniques have unlocked the potential to cheaply and easily specialize large pretrained models. However, the most prominent approaches, like low-rank adapters (LoRA), depend on heuristics or rules-of-thumb for their architectural choices -- potentially limiting their performance for new models and architectures. This limitation suggests that techniques from neural architecture search could be used to obtain optimal adapter architectures, but these are often expensive and difficult to implement. We address this challenge with Monarch Rectangular Fine-tuning (MoRe), a simple framework to search over adapter architectures that relies on the Monarch matrix class. Theoretically, we show that MoRe is more expressive than LoRA. Empirically, our approach is more parameter-efficient and performant than state-of-the-art PEFTs on a range of tasks and models, with as few as 5\% of LoRA's parameters.


Learning Coherent Clusters in Weakly-Connected Network Systems

Min, Hancheng, Mallada, Enrique

arXiv.org Artificial Intelligence

We propose a structure-preserving model-reduction methodology for large-scale dynamic networks with tightly-connected components. First, the coherent groups are identified by a spectral clustering algorithm on the graph Laplacian matrix that models the network feedback. Then, a reduced network is built, where each node represents the aggregate dynamics of each coherent group, and the reduced network captures the dynamic coupling between the groups. We provide an upper bound on the approximation error when the network graph is randomly generated from a weight stochastic block model. Finally, numerical experiments align with and validate our theoretical findings.


Replay and Synthetic Speech Detection with Res2net Architecture

Li, Xu, Li, Na, Weng, Chao, Liu, Xunying, Su, Dan, Yu, Dong, Meng, Helen

arXiv.org Artificial Intelligence

Existing approaches for replay and synthetic speech detection still lack generalizability to unseen spoofing attacks. This work proposes to leverage a novel model structure, so-called Res2Net, to improve the anti-spoofing countermeasure's generalizability. Res2Net mainly modifies the ResNet block to enable multiple feature scales. Specifically, it splits the feature maps within one block into multiple channel groups and designs a residual-like connection across different channel groups. Such connection increases the possible receptive fields, resulting in multiple feature scales. This multiple scaling mechanism significantly improves the countermeasure's generalizability to unseen spoofing attacks. It also decreases the model size compared to ResNet-based models. Experimental results show that the Res2Net model consistently outperforms ResNet34 and ResNet50 by a large margin in both physical access (PA) and logical access (LA) of the ASVspoof 2019 corpus. Moreover, integration with the squeeze-and-excitation (SE) block can further enhance performance. For feature engineering, we investigate the generalizability of Res2Net combined with different acoustic features, and observe that the constant-Q transform (CQT) achieves the most promising performance in both PA and LA scenarios. Our best single system outperforms other state-of-the-art single systems in both PA and LA of the ASVspoof 2019 corpus.


BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

Ma, Xiaolong, Li, Zhengang, Gong, Yifan, Zhang, Tianyun, Niu, Wei, Zhan, Zheng, Zhao, Pu, Tang, Jian, Lin, Xue, Ren, Bin, Wang, Yanzhi

arXiv.org Artificial Intelligence

Accelerating DNN execution on various resource-limited computing platforms has been a longstanding problem. Prior works utilize l 1-based group lasso or dynamic regularization such as ADMM to perform structured pruning on DNN models to leverage the parallel computing architectures. However, both of the pruning dimensions and pruning methods lack universality, which leads to degraded performance and limited applicability. To solve the problem, we propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method. Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds of computation-intensive layers (i.e., CONV and FC layers). To complete all aspects of the pruning-for- acceleration task, we also integrate compiler-based code optimization into our framework that can perform DNN inference in a real-time manner. To the best of our knowledge, it is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise. 1 Introduction Deep Neural Networks (DNNs) such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been extensively adopted in various artificial intelligence (AI) systems. However, accelerating the computational intensive DNN inference is very challenging for many AI applications, especially those with critical time constraints, such as self-driving cars [ Nugraha et al., 2017 ] and real-time translation [ Gehring et al., 2016 ] .