Wang, Yuqing
Formation Control of Multi-agent System with Local Interaction and Artificial Potential Field
Zhao, Luoyin, Yan, Zheping, Wang, Yuqing, Yeow, Raye Chen-Hua
A novel local interaction control method (LICM) is proposed in this paper to realize the formation control of multi-agent system (MAS). A local interaction leader follower (LILF) structure is provided by coupling the advantages of information consensus and leader follower frame, the agents can obtain the state information of the leader by interacting with their neighbours, which will reduce the communication overhead of the system and the dependence on a single node of the topology. In addition, the artificial potential field (APF) method is introduced to achieve obstacle avoidance and collision avoidance between agents. Inspired by the stress response of animals, a stress response mechanism-artificial potential field (SRM-APF) is proposed, which will be triggered when the local minimum problem of APF occurs. Ultimately, the simulation experiments of three formation shapes, including triangular formation, square formation and hexagonal formation, validate the effectiveness of the proposed method.
Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm
Wang, Yuqing, Yang, Xiao
With the rapid expansion of cloud computing infrastructure, energy consumption has become a critical challenge, driving the need for accurate and efficient prediction models. This study proposes a novel Vector Weighted Average Kernel Extreme Learning Machine (VWAA-KELM) model to enhance energy consumption prediction in cloud computing environments. By integrating a vector weighted average algorithm (VWAA) with kernel extreme learning machine (KELM), the proposed model dynamically adjusts feature weights and optimizes kernel functions, significantly improving prediction accuracy and generalization. Experimental results demonstrate the superior performance of VWAA-KELM: 94.7% of test set prediction errors fall within [0, 50] units, with only three cases exceeding 100 units, indicating strong stability. The model achieves a coefficient of determination (R2) of 0.987 in the training set (RMSE = 28.108, RPD = 8.872) and maintains excellent generalization with R2 = 0.973 in the test set (RMSE = 43.227, RPD = 6.202). Visual analysis confirms that predicted values closely align with actual energy consumption trends, avoiding overfitting while capturing nonlinear dependencies. A key innovation of this study is the introduction of adaptive feature weighting, allowing the model to dynamically assign importance to different input parameters, thereby enhancing high-dimensional data processing. This advancement provides a scalable and efficient approach for optimizing cloud data center energy consumption. Beyond cloud computing, the proposed hybrid framework has broader applications in Internet of Things (IoT) and edge computing, supporting real-time energy management and intelligent resource allocation.
Research on Edge Computing and Cloud Collaborative Resource Scheduling Optimization Based on Deep Reinforcement Learning
Wang, Yuqing, Yang, Xiao
This study addresses the challenge of resource scheduling optimization in edge-cloud collaborative computing using deep reinforcement learning (DRL). The proposed DRL-based approach improves task processing efficiency, reduces overall processing time, enhances resource utilization, and effectively controls task migrations. Experimental results demonstrate the superiority of DRL over traditional scheduling algorithms, particularly in managing complex task allocation, dynamic workloads, and multiple resource constraints. Despite its advantages, further improvements are needed to enhance learning efficiency, reduce training time, and address convergence issues. Future research should focus on increasing the algorithm's fault tolerance to handle more complex and uncertain scheduling scenarios, thereby advancing the intelligence and efficiency of edge-cloud computing systems.
Design and implementation of a distributed security threat detection system integrating federated learning and multimodal LLM
Wang, Yuqing, Yang, Xiao
Abstract: Traditional security protection methods struggle to address sophisticated attack vectors in large - scale distributed systems, particularly when balancing detection accuracy with data privacy concerns. This paper presents a novel distributed security threat detection system that integrates federated learning with multimodal large language models (LLMs). Our system leverages federated learning to ensure data privacy while employing multimodal LLMs to process heterogeneous data sources including netwo rk traffic, system logs, images, and sensor data. Experimental evaluation on a 10TB distributed dataset demonstrates that our approach achieves 96.4% detection accuracy, outperforming traditional baseline models by 4.1 percentage points. The system reduces both false positive and false negative rates by 1.8 and 2.4 percentage points respectively. Performance analysis shows that our system maintains efficient processing capabilities in distributed environments, requiring 180 seconds for model training and 3. 8 seconds for threat detection across the distributed network. These results demonstrate significant improvements in detection accuracy and computational efficiency while preserving data privacy, suggesting strong potential for real - world deployment in lar ge - scale security systems.
Research on Enhancing Cloud Computing Network Security using Artificial Intelligence Algorithms
Wang, Yuqing, Yang, Xiao
Cloud computing environments are increasingly vulnerable to security threats such as distributed denial-of-service (DDoS) attacks and SQL injection. Traditional security mechanisms, based on rule matching and feature recognition, struggle to adapt to evolving attack strategies. This paper proposes an adaptive security protection framework leveraging deep learning to construct a multi-layered defense architecture. The proposed system is evaluated in a real-world business environment, achieving a detection accuracy of 97.3%, an average response time of 18 ms, and an availability rate of 99.999%. Experimental results demonstrate that the proposed method significantly enhances detection accuracy, response efficiency, and resource utilization, offering a novel and effective approach to cloud computing security.
Machine Learning-Based Cloud Computing Compliance Process Automation
Wang, Yuqing, Yang, Xiao
Cloud computing adoption across industries has revolutionized enterprise operations while introducing significant challenges in compliance management. Organizations must continuously meet evolving regulatory requirements such as GDPR and ISO 27001, yet traditional manual review processes have become increasingly inadequate for modern business scales. This paper presents a novel machine learning-based framework for automating cloud computing compliance processes, addressing critical challenges including resource-intensive manual reviews, extended compliance cycles, and delayed risk identification. Our proposed framework integrates multiple machine learning technologies, including BERT-based document processing (94.5% accuracy), One-Class SVM for anomaly detection (88.7% accuracy), and an improved CNN-LSTM architecture for sequential compliance data analysis (90.2% accuracy). Implementation results demonstrate significant improvements: reducing compliance process duration from 7 days to 1.5 days, improving accuracy from 78% to 93%, and decreasing manual effort by 73.3%. A real-world deployment at a major securities firm validated these results, processing 800,000 daily transactions with 94.2% accuracy in risk identification.
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Yuan, Jingyang, Gao, Huazuo, Dai, Damai, Luo, Junyu, Zhao, Liang, Zhang, Zhengyan, Xie, Zhenda, Wei, Y. X., Wang, Lean, Xiao, Zhiping, Wang, Yuqing, Ruan, Chong, Zhang, Ming, Liang, Wenfeng, Zeng, Wangding
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving efficiency while maintaining model capabilities. We present NSA, a Natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling. NSA employs a dynamic hierarchical sparse strategy, combining coarse-grained token compression with fine-grained token selection to preserve both global context awareness and local precision. Our approach advances sparse attention design with two key innovations: (1) We achieve substantial speedups through arithmetic intensity-balanced algorithm design, with implementation optimizations for modern hardware. (2) We enable end-to-end training, reducing pretraining computation without sacrificing model performance. As shown in Figure 1, experiments show the model pretrained with NSA maintains or exceeds Full Attention models across general benchmarks, long-context tasks, and instruction-based reasoning. Meanwhile, NSA achieves substantial speedups over Full Attention on 64k-length sequences across decoding, forward propagation, and backward propagation, validating its efficiency throughout the model lifecycle.
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
Xia, Haotian, Yang, Zhengbang, Zou, Junbo, Tracy, Rhys, Wang, Yuqing, Lu, Chi, Lai, Christopher, He, Yanjun, Shao, Xun, Xie, Zhuoqing, Wang, Yuan-fang, Shen, Weining, Chen, Hanjie
Multimodal Large Language Models (MLLMs) are advancing the ability to reason about complex sports scenarios by integrating textual and visual information. To comprehensively evaluate their capabilities, we introduce SPORTU, a benchmark designed to assess MLLMs across multi-level sports reasoning tasks. SPORTU comprises two key components: SPORTU-text, featuring 900 multiple-choice questions with human-annotated explanations for rule comprehension and strategy understanding. This component focuses on testing models' ability to reason about sports solely through question-answering (QA), without requiring visual inputs; SPORTU-video, consisting of 1,701 slow-motion video clips across 7 different sports and 12,048 QA pairs, designed to assess multi-level reasoning, from simple sports recognition to complex tasks like foul detection and rule application. We evaluate four prevalent LLMs mainly utilizing few-shot learning paradigms supplemented by chain-of-thought (CoT) prompting on the SPORTU-text part. We evaluate four LLMs using few-shot learning and chain-of-thought (CoT) prompting on SPORTU-text. GPT-4o achieves the highest accuracy of 71%, but still falls short of human-level performance, highlighting room for improvement in rule comprehension and reasoning. The evaluation for the SPORTU-video part includes 7 proprietary and 6 open-source MLLMs. Experiments show that models fall short on hard tasks that require deep reasoning and rule-based understanding. Claude-3.5-Sonnet performs the best with only 52.6% accuracy on the hard task, showing large room for improvement. We hope that SPORTU will serve as a critical step toward evaluating models' capabilities in sports understanding and reasoning.
Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Xu, Zhenghao, Wang, Yuqing, Zhao, Tuo, Ward, Rachel, Tao, Molei
We study the convergence rate of first-order methods for rectangular matrix factorization, which is a canonical nonconvex optimization problem. Specifically, given a rank-$r$ matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$, we prove that gradient descent (GD) can find a pair of $\epsilon$-optimal solutions $\mathbf{X}_T\in\mathbb{R}^{m\times d}$ and $\mathbf{Y}_T\in\mathbb{R}^{n\times d}$, where $d\geq r$, satisfying $\lVert\mathbf{X}_T\mathbf{Y}_T^\top-\mathbf{A}\rVert_\mathrm{F}\leq\epsilon\lVert\mathbf{A}\rVert_\mathrm{F}$ in $T=O(\kappa^2\log\frac{1}{\epsilon})$ iterations with high probability, where $\kappa$ denotes the condition number of $\mathbf{A}$. Furthermore, we prove that Nesterov's accelerated gradient (NAG) attains an iteration complexity of $O(\kappa\log\frac{1}{\epsilon})$, which is the best-known bound of first-order methods for rectangular matrix factorization. Different from small balanced random initialization in the existing literature, we adopt an unbalanced initialization, where $\mathbf{X}_0$ is large and $\mathbf{Y}_0$ is $0$. Moreover, our initialization and analysis can be further extended to linear neural networks, where we prove that NAG can also attain an accelerated linear convergence rate. In particular, we only require the width of the network to be greater than or equal to the rank of the output label matrix. In contrast, previous results achieving the same rate require excessive widths that additionally depend on the condition number and the rank of the input data matrix.
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions
Xiong, Tianwei, Wang, Yuqing, Zhou, Daquan, Lin, Zhijie, Feng, Jiashi, Liu, Xihui
The efficacy of video generation models heavily depends on the quality of their training datasets. Most previous video generation models are trained on short video clips, while recently there has been increasing interest in training long video generation models directly on longer videos. However, the lack of such high-quality long videos impedes the advancement of long video generation. To promote research in long video generation, we desire a new dataset with four key features essential for training long video generation models: (1) long videos covering at least 10 seconds, (2) long-take videos without cuts, (3) large motion and diverse contents, and (4) temporally dense captions. To achieve this, we introduce a new pipeline for selecting high-quality long-take videos and generating temporally dense captions. Specifically, we define a set of metrics to quantitatively assess video quality including scene cuts, dynamic degrees, and semantic-level quality, enabling us to filter high-quality long-take videos from a large amount of source videos. Subsequently, we develop a hierarchical video captioning pipeline to annotate long videos with temporally-dense captions. With this pipeline, we curate the first long-take video dataset, LVD-2M, comprising 2 million long-take videos, each covering more than 10 seconds and annotated with temporally dense captions. We further validate the effectiveness of LVD-2M by fine-tuning video generation models to generate long videos with dynamic motions. We believe our work will significantly contribute to future research in long video generation.