Goto

Collaborating Authors

 Sun, Sijin


FMNet: Frequency-Assisted Mamba-Like Linear Attention Network for Camouflaged Object Detection

arXiv.org Artificial Intelligence

Camouflaged Object Detection (COD) is challenging due to the strong similarity between camouflaged objects and their surroundings, which complicates identification. Existing methods mainly rely on spatial local features, failing to capture global information, while Transformers increase computational costs.To address this, the Frequency-Assisted Mamba-Like Linear Attention Network (FMNet) is proposed, which leverages frequency-domain learning to efficiently capture global features and mitigate ambiguity between objects and the background. FMNet introduces the Multi-Scale Frequency-Assisted Mamba-Like Linear Attention (MFM) module, integrating frequency and spatial features through a multi-scale structure to handle scale variations while reducing computational complexity. Additionally, the Pyramidal Frequency Attention Extraction (PFAE) module and the Frequency Reverse Decoder (FRD) enhance semantics and reconstruct features. Experimental results demonstrate that FMNet outperforms existing methods on multiple COD datasets, showcasing its advantages in both performance and efficiency. Code available at https://anonymous.4open.science/r/FMNet-3CE5.


Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition

arXiv.org Artificial Intelligence

Maritime Multi-Scene Recognition is crucial for enhancing the capabilities of intelligent marine robotics, particularly in applications such as marine conservation, environmental monitoring, and disaster response. However, this task presents significant challenges due to environmental interference, where marine conditions degrade image quality, and the complexity of maritime scenes, which requires deeper reasoning for accurate recognition. Pure vision models alone are insufficient to address these issues. To overcome these limitations, we propose a novel multimodal Artificial Intelligence (AI) framework that integrates image data, textual descriptions and classification vectors generated by a Multimodal Large Language Model (MLLM), to provide richer semantic understanding and improve recognition accuracy. Our framework employs an efficient multimodal fusion mechanism to further enhance model robustness and adaptability in complex maritime environments. Experimental results show that our model achieves 98$\%$ accuracy, surpassing previous SOTA models by 3.5$\%$. To optimize deployment on resource-constrained platforms, we adopt activation-aware weight quantization (AWQ) as a lightweight technique, reducing the model size to 68.75MB with only a 0.5$\%$ accuracy drop while significantly lowering computational overhead. This work provides a high-performance solution for real-time maritime scene recognition, enabling Autonomous Surface Vehicles (ASVs) to support environmental monitoring and disaster response in resource-limited settings.


Self-Adaptive Gamma Context-Aware SSM-based Model for Metal Defect Detection

arXiv.org Artificial Intelligence

The quality of metal surfaces is critical in various industrial applications, including aerospace, manufacturing, and container transportation. Surface defects, such as cracks, dents, and scratches, not only compromise the structural integrity and aesthetics of metal products but also lead to significant economic losses if left undetected. As a result, the accurate and efficient detection of metal surface defects has become an essential task in industrial quality control. In recent years, the adoption of deep learning techniques has significantly advanced the performance of defect detection systems [1]. Convolutional neural networks (CNNs) and transformer-based models have demonstrated exceptional capabilities in handling complex image-based tasks, enabling automated and reliable defect detection. However, several challenges remain: 1) Metal defect often exhibits varied and localized features, making effective multi-scale feature aggregation vital for improving detection accuracy.


HREB-CRF: Hierarchical Reduced-bias EMA for Chinese Named Entity Recognition

arXiv.org Artificial Intelligence

Incorrect boundary division, complex semantic representation, and differences in pronunciation and meaning often lead to errors in Chinese Named Entity Recognition(CNER). To address these issues, this paper proposes HREB-CRF framework: Hierarchical Reduced-bias EMA with CRF. The proposed method amplifies word boundaries and pools long text gradients through exponentially fixed-bias weighted average of local and global hierarchical attention. Experimental results on the MSRA, Resume, and Weibo datasets show excellent in F1, outperforming the baseline model by 1.1\%, 1.6\%, and 9.8\%. The significant improvement in F1 shows evidences of strong effectiveness and robustness of approach in CNER tasks.