Goto

Collaborating Authors

 channel importance


Superposition unifies power-law training dynamics

Chen, Zixin Jessie, Chen, Hao, Liu, Yizhou, Gore, Jeff

arXiv.org Machine Learning

We investigate the role of feature superposition in the emergence of power-law training dynamics using a teacher-student framework. We first derive an analytic theory for training without superposition, establishing that the power-law training exponent depends on both the input data statistics and channel importance. Remarkably, we discover that a superposition bottleneck induces a transition to a universal power-law exponent of $\sim 1$, independent of data and channel statistics. This one over time training with superposition represents an up to tenfold acceleration compared to the purely sequential learning that takes place in the absence of superposition. Our finding that superposition leads to rapid training with a data-independent power law exponent may have important implications for a wide range of neural networks that employ superposition, including production-scale large language models.


PySeizure: A single machine learning classifier framework to detect seizures in diverse datasets

Chybowski, Bartlomiej, Abdullateef, Shima, Haule, Hollan, Gonzalez-Sulser, Alfredo, Escudero, Javier

arXiv.org Artificial Intelligence

Reliable seizure detection is critical for diagnosing and managing epilepsy, yet clinical workflows remain dependent on time-consuming manual EEG interpretation. While machine learning has shown promise, existing approaches often rely on dataset-specific optimisations, limiting their real-world applicability and reproducibility. Here, we introduce an innovative, open-source machine-learning framework that enables robust and generalisable seizure detection across varied clinical datasets. We evaluate our approach on two publicly available EEG datasets that differ in patient populations and electrode configurations. To enhance robustness, the framework incorporates an automated pre-processing pipeline to standardise data and a majority voting mechanism, in which multiple models independently assess each second of EEG before reaching a final decision. We train, tune, and evaluate models within each dataset, assessing their cross-dataset transferability. Our models achieve high within-dataset performance (AUC 0.904+/-0.059 for CHB-MIT and 0.864+/-0.060 for TUSZ) and demonstrate strong generalisation across datasets despite differences in EEG setups and populations (AUC 0.615+/-0.039 for models trained on CHB-MIT and tested on TUSZ and 0.762+/-0.175 in the reverse case) without any post-processing. Furthermore, a mild post-processing improved the within-dataset results to 0.913+/-0.064 and 0.867+/-0.058 and cross-dataset results to 0.619+/-0.036 and 0.768+/-0.172. These results underscore the potential of, and essential considerations for, deploying our framework in diverse clinical settings. By making our methodology fully reproducible, we provide a foundation for advancing clinically viable, dataset-agnostic seizure detection systems. This approach has the potential for widespread adoption, complementing rather than replacing expert interpretation, and accelerating clinical integration.


MIA-Mind: A Multidimensional Interactive Attention Mechanism Based on MindSpore

Qin, Zhenkai, Liang, Jiaquan, Fang, Qiao

arXiv.org Artificial Intelligence

Attention mechanisms have significantly advanced deep learning by enhancing feature representation through selective focus. However, existing approaches often independently model channel importance and spatial saliency, overlooking their inherent interdependence and limiting their effectiveness. To address this limitation, we propose MIA-Mind, a lightweight and modular Multidimensional Interactive Attention Mechanism, built upon the MindSpore framework. MIA-Mind jointly models spatial and channel features through a unified cross-attentive fusion strategy, enabling fine-grained feature recalibration with minimal computational overhead. Extensive experiments are conducted on three representative datasets: on CIFAR-10, MIA-Mind achieves an accuracy of 82.9\%; on ISBI2012, it achieves an accuracy of 78.7\%; and on CIC-IDS2017, it achieves an accuracy of 91.9\%. These results validate the versatility, lightweight design, and generalization ability of MIA-Mind across heterogeneous tasks. Future work will explore the extension of MIA-Mind to large-scale datasets, the development of ada,ptive attention fusion strategies, and distributed deployment to further enhance scalability and robustness.


Filter Pruning For CNN With Enhanced Linear Representation Redundancy

Wang, Bojue, Ma, Chunmei, Liu, Bin, Liu, Nianbo, Zhu, Jinqi

arXiv.org Artificial Intelligence

Structured network pruning excels non-structured methods because they can take advantage of the thriving developed parallel computing techniques. In this paper, we propose a new structured pruning method. Firstly, to create more structured redundancy, we present a data-driven loss function term calculated from the correlation coefficient matrix of different feature maps in the same layer, named CCM-loss. This loss term can encourage the neural network to learn stronger linear representation relations between feature maps during the training from the scratch so that more homogenous parts can be removed later in pruning. CCM-loss provides us with another universal transcendental mathematical tool besides L*-norm regularization, which concentrates on generating zeros, to generate more redundancy but for the different genres. Furthermore, we design a matching channel selection strategy based on principal components analysis to exploit the maximum potential ability of CCM-loss. In our new strategy, we mainly focus on the consistency and integrality of the information flow in the network. Instead of empirically hard-code the retain ratio for each layer, our channel selection strategy can dynamically adjust each layer's retain ratio according to the specific circumstance of a per-trained model to push the prune ratio to the limit. Notably, on the Cifar-10 dataset, our method brings 93.64% accuracy for pruned VGG-16 with only 1.40M parameters and 49.60M FLOPs, the pruned ratios for parameters and FLOPs are 90.6% and 84.2%, respectively. For ResNet-50 trained on the ImageNet dataset, our approach achieves 42.8% and 47.3% storage and computation reductions, respectively, with an accuracy of 76.23%. Our code is available at https://github.com/Bojue-Wang/CCM-LRR.


Learning Channel Importance for High Content Imaging with Interpretable Deep Input Channel Mixing

Siegismund, Daniel, Wieser, Mario, Heyse, Stephan, Steigele, Stephan

arXiv.org Machine Learning

Uncovering novel drug candidates for treating complex diseases remain one of the most challenging tasks in early discovery research. To tackle this challenge, biopharma research established a standardized high content imaging protocol that tags different cellular compartments per image channel. In order to judge the experimental outcome, the scientist requires knowledge about the channel importance with respect to a certain phenotype for decoding the underlying biology. In contrast to traditional image analysis approaches, such experiments are nowadays preferably analyzed by deep learning based approaches which, however, lack crucial information about the channel importance. To overcome this limitation, we present a novel approach which utilizes multi-spectral information of high content images to interpret a certain aspect of cellular biology. To this end, we base our method on image blending concepts with alpha compositing for an arbitrary number of channels. More specifically, we introduce DCMIX, a lightweight, scaleable and end-to-end trainable mixing layer which enables interpretable predictions in high content imaging while retaining the benefits of deep learning based methods. We employ an extensive set of experiments on both MNIST and RXRX1 datasets, demonstrating that DCMIX learns the biologically relevant channel importance without scarifying prediction performance.


CP$^3$: Channel Pruning Plug-in for Point-based Networks

Huang, Yaomin, Liu, Ning, Che, Zhengping, Xu, Zhiyuan, Shen, Chaomin, Peng, Yaxin, Zhang, Guixu, Liu, Xinmei, Feng, Feifei, Tang, Jian

arXiv.org Artificial Intelligence

Channel pruning can effectively reduce both computational cost and memory footprint of the original network while keeping a comparable accuracy performance. Though great success has been achieved in channel pruning for 2D image-based convolutional networks (CNNs), existing works seldom extend the channel pruning methods to 3D point-based neural networks (PNNs). Directly implementing the 2D CNN channel pruning methods to PNNs undermine the performance of PNNs because of the different representations of 2D images and 3D point clouds as well as the network architecture disparity. In this paper, we proposed CP$^3$, which is a Channel Pruning Plug-in for Point-based network. CP$^3$ is elaborately designed to leverage the characteristics of point clouds and PNNs in order to enable 2D channel pruning methods for PNNs. Specifically, it presents a coordinate-enhanced channel importance metric to reflect the correlation between dimensional information and individual channel features, and it recycles the discarded points in PNN's sampling process and reconsiders their potentially-exclusive information to enhance the robustness of channel pruning. Experiments on various PNN architectures show that CP$^3$ constantly improves state-of-the-art 2D CNN pruning approaches on different point cloud tasks. For instance, our compressed PointNeXt-S on ScanObjectNN achieves an accuracy of 88.52% with a pruning rate of 57.8%, outperforming the baseline pruning methods with an accuracy gain of 1.94%.