Goto

Collaborating Authors

 lightweight network


649adc59afdef2a8b9e943f94a04b02f-Paper.pdf

Neural Information Processing Systems

But these methods are unable to improve throughput (frames-per-second) on real-life hardware while simultaneously preserving robustness toadversarial perturbations.


LiteVPNet: A Lightweight Network for Video Encoding Control in Quality-Critical Applications

Vibhoothi, Vibhoothi, Pitié, François, Kokaram, Anil

arXiv.org Artificial Intelligence

In the last decade, video workflows in the cinema production ecosystem have presented new use cases for video streaming technology. These new workflows, e.g. in On-set Virtual Production, present the challenge of requiring precise quality control and energy efficiency. Existing approaches to transcoding often fall short of these requirements, either due to a lack of quality control or computational overhead. To fill this gap, we present a lightweight neural network (LiteVPNet) for accurately predicting Quantisation Parameters for NVENC AV1 encoders that achieve a specified VMAF score. We use low-complexity features, including bitstream characteristics, video complexity measures, and CLIP-based semantic embeddings. Our results demonstrate that LiteVPNet achieves mean VMAF errors below 1.2 points across a wide range of quality targets. Notably, LiteVPNet achieves VMAF errors within 2 points for over 87% of our test corpus, c.f. approx 61% with state-of-the-art methods. LiteVPNet's performance across various quality regions highlights its applicability for enhancing high-value content transport and streaming for more energy-efficient, high-quality media experiences.


Lightweight Multi-Scale Feature Extraction with Fully Connected LMF Layer for Salient Object Detection

Shi, Yunpeng, Chen, Lei, Shen, Xiaolu, Guo, Yanju

arXiv.org Artificial Intelligence

Since AlexNet [1] won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, deep neural networks (DNNs) have rapidly evolved, surpassing traditional machine learning methods in accuracy and becoming the dominant approach in computer vision. By stacking multiple convolu-tional layers, AlexNet enabled the network to learn increasingly complex image features, profoundly influencing subsequent network architectures, such as VGG [2]. However, despite significant performance improvements, DNNs often suffer from an excessive number of parameters and high computational costs, making them challenging to deploy on resource-constrained devices. Moreover, as network depth and complexity increase, performance gains tend to diminish. Consequently, developing efficient neural networks with fewer parameters and reduced computational complexity has become a crucial research direction, driving the growing interest in lightweight network design. Optimization strategies for lightweight networks generally fall into two categories: lightweight model design and model compression. Unlike model compression, which reduces redundancy in pre-trained models, lightweight model design fundamentally lowers computational complexity and parameter count, avoiding potential performance degradation caused by compression techniques. Studies have shown that multi-scale feature learning is essential for enhancing model representation capabilities, particularly in dense prediction tasks such as image segmentation and salient object detection (SOD). Traditional convolutional neural networks (CNNs), including VGG and ResNet [3], achieve multi-scale feature learning by encoding high-level semantic information in deeper layers while preserving low-level details in shallower ones.


EEGMobile: Enhancing Speed and Accuracy in EEG-Based Gaze Prediction with Advanced Mobile Architectures

Liang, Teng, Damoah, Andrews

arXiv.org Artificial Intelligence

Electroencephalography (EEG) analysis is an important domain in the realm of Brain-Computer Interface (BCI) research. To ensure BCI devices are capable of providing practical applications in the real world, brain signal processing techniques must be fast, accurate, and resource-conscious to deliver low-latency neural analytics. This study presents a model that leverages a pre-trained MobileViT alongside Knowledge Distillation (KD) for EEG regression tasks. Our results showcase that this model is capable of performing at a level comparable (only 3% lower) to the previous State-Of-The-Art (SOTA) on the EEGEyeNet Absolute Position Task while being 33% faster and 60% smaller. Our research presents a cost-effective model applicable to resource-constrained devices and contributes to expanding future research on lightweight, mobile-friendly models for EEG regression.


RepAct: The Re-parameterizable Adaptive Activation Function

Wu, Xian, Tao, Qingchuan, Wang, Shuang

arXiv.org Artificial Intelligence

Since the inception of the neural network resurgence with AlexNet[1], the design of activation functions has garnered continuous attention [2]. Activation functions introduce non-linearity into the networks and play a critical role in feature extraction. In traditional neural network designs, the selection and design of activation functions for different layers and networks are typically based on manual design experience [3, 4] or adapted through NAS (Neural Architecture Search) [5, 6]. The emergence of adaptive activation function [6, 7, 8] effectively improves network performance. Subsequently, various dynamic adaptive parameters were introduced from different dimensions of the feature graph [9, 10, 11]. Although the introduced parameters and computation amount were small, the memory cost of element by element operation often formed a bottleneck of lightweight network reasoning [12]. When deploying lightweight networks on resource-constrained edge devices, real-time performance requirements impose strict limitations on model parameters, computational power, and memory operations [12, 13, 14, 15]. Convolutional neural networks exhibit sparsity in their activations [16], which prevents lightweight networks from fully utilizing model capacity to learn features from task data. We have noticed that re-parameterizable convolutional structures [17, 18, 19, 20, 21], by virtue of their multi-branch architecture during training, enhance the network's feature capturing ability.


Streamlining Redundant Layers to Compress Large Language Models

Chen, Xiaodong, Hu, Yuxuan, Zhang, Jing, Wang, Yanling, Li, Cuiping, Chen, Hong

arXiv.org Artificial Intelligence

This paper introduces LLM-Streamline, a novel layer pruning approach for large language models. It is based on the observation that different layers have varying impacts on hidden states, enabling the identification of less important layers. LLM-Streamline comprises two parts: layer pruning, which removes consecutive layers with the lowest importance based on target sparsity, and layer replacement, where a lightweight network is trained to replace the pruned layers to mitigate performance loss. Additionally, a new metric called "stability" is proposed to address the limitations of accuracy in evaluating model compression. Experiments show that LLM-Streamline surpasses previous state-of-the-art pruning methods in both accuracy and stability.


Deep Space Separable Distillation for Lightweight Acoustic Scene Classification

Ye, ShuQi, Tian, Yuan

arXiv.org Artificial Intelligence

Acoustic scene classification (ASC) is highly important in the real world. Recently, deep learning-based methods have been widely employed for acoustic scene classification. However, these methods are currently not lightweight enough as well as their performance is not satisfactory. To solve these problems, we propose a deep space separable distillation network. Firstly, the network performs high-low frequency decomposition on the log-mel spectrogram, significantly reducing computational complexity while maintaining model performance. Secondly, we specially design three lightweight operators for ASC, including Separable Convolution (SC), Orthonormal Separable Convolution (OSC), and Separable Partial Convolution (SPC). These operators exhibit highly efficient feature extraction capabilities in acoustic scene classification tasks. The experimental results demonstrate that the proposed method achieves a performance gain of 9.8% compared to the currently popular deep learning methods, while also having smaller parameter count and computational complexity.


Unifying Synergies between Self-supervised Learning and Dynamic Computation

Krishna, Tarun, Rai, Ayush K, Drimbarean, Alexandru, Arazo, Eric, Albert, Paul, Smeaton, Alan F, McGuinness, Kevin, O'Connor, Noel E

arXiv.org Artificial Intelligence

Self-supervised representation learning methods [4, 7, 11, 12, 14] are the standard approach for training large scale deep neural networks (DNNs). One of the main reasons for their popularity is their capability to leverage the inherent structure of data from a vast unlabeled corpus during pre-training, which makes them highly suitable for transfer learning [28]. However, this comes at the cost of substantially larger model size, computationally expensive training strategies (larger training times, large batch-sizes, etc.) [13, 28] and subsequently more expensive inference times. Though such strategies are effective for achieving state-of-the-art results in computer vision, they may not be practical in resource-constrained industrial settings that require lightweight models to be deployed on edge devices. To lessen the computational burden, it is common to extract (or learn) a lightweight network from an off-the-shelf pre-trained model. This has been successfully achieved through techniques such as knowledge distillation (KD) [35], pruning [24], dynamic computation (DC) [58], etc. KD methods follow a standard two-step procedure of pre-training and distilling knowledge into a student network using self-supervised (SS) objective [1, 21, 51] or by together incorporating supervised and SS objectives [54], while pruning based approaches heavily rely on multiple steps of pre-train prune finetune to get a lightweight network irrespective of the objective, whereas methods based on dynamic/conditional computation [34, 58] again rely on a pre-trained model to obtain a lightweight network while keeping the network topology intact via a gating mechanism. These approaches are effective but using fine-tuning to obtain a sub-network from large pre-trained models (such as Large Language Models) can be computationally expensive and cumbersome. Also, since downstream tasks are diverse and vary widely, any change in the task requires repeating the entire procedure multiple times, making it inefficient and less transferable.


Double Attention-based Lightweight Network for Plant Pest Recognition

Janarthan, Sivasubramaniam, Thuseethan, Selvarajah, Rajasegarar, Sutharshan, Yearwood, John

arXiv.org Artificial Intelligence

Timely recognition of plant pests from field images is significant to avoid potential losses of crop yields. Traditional convolutional neural network-based deep learning models demand high computational capability and require large labelled samples for each pest type for training. On the other hand, the existing lightweight network-based approaches suffer in correctly classifying the pests because of common characteristics and high similarity between multiple plant pests. In this work, a novel double attention-based lightweight deep learning architecture is proposed to automatically recognize different plant pests. The lightweight network facilitates faster and small data training while the double attention module increases performance by focusing on the most pertinent information. The proposed approach achieves 96.61%, 99.08% and 91.60% on three variants of two publicly available datasets with 5869, 545 and 500 samples, respectively. Moreover, the comparison results reveal that the proposed approach outperforms existing approaches on both small and large datasets consistently.


Improved lightweight identification of agricultural diseases based on MobileNetV3

Jiang, Yuhang, Tong, Wenping

arXiv.org Artificial Intelligence

At present, the identification of agricultural pests and diseases has the problem that the model is not lightweight enough and difficult to apply. Based on MobileNetV3, this paper introduces the Coordinate Attention block. The parameters of MobileNetV3-large are reduced by 22%, the model size is reduced by 19.7%, and the accuracy is improved by 0.92%. The parameters of MobileNetV3-small are reduced by 23.4%, the model size is reduced by 18.3%, and the accuracy is increased by 0.40%. In addition, the improved MobileNetV3-small was migrated to Jetson Nano for testing. The accuracy increased by 2.48% to 98.31%, and the inference speed increased by 7.5%. It provides a reference for deploying the agricultural pest identification model to embedded devices.