Goto

Collaborating Authors

 cloud platform


Advancing Software Security and Reliability in Cloud Platforms through AI-based Anomaly Detection

Saleh, Sabbir M., Sayem, Ibrahim Mohammed, Madhavji, Nazim, Steinbacher, John

arXiv.org Artificial Intelligence

Continuous Integration/Continuous Deployment (CI/CD) is fundamental for advanced software development, supporting faster and more efficient delivery of code changes into cloud environments. However, security issues in the CI/CD pipeline remain challenging, and incidents (e.g., DDoS, Bot, Log4j, etc.) are happening over the cloud environments. While plenty of literature discusses static security testing and CI/CD practices, only a few deal with network traffic pattern analysis to detect different cyberattacks. This research aims to enhance CI/CD pipeline security by implementing anomaly detection through AI (Artificial Intelligence) support. The goal is to identify unusual behaviour or variations from network traffic patterns in pipeline and cloud platforms. The system shall integrate into the workflow to continuously monitor pipeline activities and cloud infrastructure. Additionally, it aims to explore adaptive response mechanisms to mitigate the detected anomalies or security threats. This research employed two popular network traffic datasets, CSE-CIC-IDS2018 and CSE-CIC-IDS2017. We implemented a combination of Convolution Neural Network(CNN) and Long Short-Term Memory (LSTM) to detect unusual traffic patterns. We achieved an accuracy of 98.69% and 98.30% and generated log files in different CI/CD pipeline stages that resemble the network anomalies affected to address security challenges in modern DevOps practices, contributing to advancing software security and reliability.


Research on Cloud Platform Network Traffic Monitoring and Anomaly Detection System based on Large Language Models

Yang, Ze, Jin, Yihong, Liu, Juntian, Xu, Xinhe, Zhang, Yihan, Ji, Shuyang

arXiv.org Artificial Intelligence

The rapidly evolving cloud platforms and the escalating complexity of network traffic demand proper network traffic monitoring and anomaly detection to ensure network security and performance. This paper introduces a large language model (LLM)-based network traffic monitoring and anomaly detection system. In addition to existing models such as autoencoders and decision trees, we harness the power of large language models for processing sequence data from network traffic, which allows us a better capture of underlying complex patterns, as well as slight fluctuations in the dataset. We show for a given detection task, the need for a hybrid model that incorporates the attention mechanism of the transformer architecture into a supervised learning framework in order to achieve better accuracy. A pre-trained large language model analyzes and predicts the probable network traffic, and an anomaly detection layer that considers temporality and context is added. Moreover, we present a novel transfer learning-based methodology to enhance the model's effectiveness to quickly adapt to unknown network structures and adversarial conditions without requiring extensive labeled datasets. Actual results show that the designed model outperforms traditional methods in detection accuracy and computational efficiency, effectively identify various network anomalies such as zero-day attacks and traffic congestion pattern, and significantly reduce the false positive rate.


Research on Large Language Model Cross-Cloud Privacy Protection and Collaborative Training based on Federated Learning

Yang, Ze, Jin, Yihong, Zhang, Yihan, Liu, Juntian, Xu, Xinhe

arXiv.org Artificial Intelligence

The fast development of large language models (LLMs) and popularization of cloud computing have led to increasing concerns on privacy safeguarding and data security of cross-cloud model deployment and training as the key challenges. We present a new framework for addressing these issues along with enabling privacy preserving collaboration on training between distributed clouds based on federated learning. Our mechanism encompasses cutting-edge cryptographic primitives, dynamic model aggregation techniques, and cross-cloud data harmonization solutions to enhance security, efficiency, and scalability to the traditional federated learning paradigm. Furthermore, we proposed a hybrid aggregation scheme to mitigate the threat of Data Leakage and to optimize the aggregation of model updates, thus achieving substantial enhancement on the model effectiveness and stability. Experimental results demonstrate that the training efficiency, privacy protection, and model accuracy of the proposed model compare favorably to those of the traditional federated learning method.


Reviews: Positive-Unlabeled Compression on the Cloud

Neural Information Processing Systems

The paper targets the application of network compression using a cloud platform. Instead of uploading all the training data onto the platform, the paper suggests uploading a small portion of data as positive (P) data and use larger datasets already on the platform as unlabeled (U) data. After training a PU classifier, the classifier will be used to select more P data from the U data. And such selected data, together with the original data, are used in a knowledge distillation framework to compress the original network. The experimental results show that the compressed network's performance is close to the original deep neural network trained on all data, on three widely used datasets.


Optimization and Application of Cloud-based Deep Learning Architecture for Multi-Source Data Prediction

Zhang, Yang, Wang, Fa, Huang, Xin, Li, Xintao, Liu, Sibei, Zhang, Hansong

arXiv.org Artificial Intelligence

This study develops a cloud-based deep learning system for early prediction of diabetes, leveraging the distributed computing capabilities of the AWS cloud platform and deep learning technologies to achieve efficient and accurate risk assessment. The system utilizes EC2 p3.8xlarge GPU instances to accelerate model training, reducing training time by 93.2% while maintaining a prediction accuracy of 94.2%. With an automated data processing and model training pipeline built using Apache Airflow, the system can complete end-to-end updates within 18.7 hours. In clinical applications, the system demonstrates a prediction accuracy of 89.8%, sensitivity of 92.3%, and specificity of 95.1%. Early interventions based on predictions lead to a 37.5% reduction in diabetes incidence among the target population. The system's high performance and scalability provide strong support for large-scale diabetes prevention and management, showcasing significant public health value.


Research on Key Technologies for Cross-Cloud Federated Training of Large Language Models

Yang, Haowei, Sui, Mingxiu, Liu, Shaobo, Qian, Xinyue, Zhang, Zhaoyang, Liu, Bingying

arXiv.org Artificial Intelligence

These models have achieved remarkable success in areas such as machine translation, speech recognition, and text generation. However, training these large models typically requires vast computational resources and data, which not only places high demands on the resources of a single cloud platform but can also lead to computational bottlenecks, latency issues, and cost pressures[1]. Cross-cloud federated training has emerged as an effective solution to these challenges. By leveraging the computational resources of multiple cloud platforms, cross-cloud federated training enables distributed processing of large datasets and synchronous model parameter updates, thereby accelerating the training process. The implementation of cross-cloud federated training involves addressing several key technical challenges, including efficiently allocating and managing the computational resources of cloud platforms, optimizing data communication between clouds, and ensuring data privacy and security during the training process[2].


AI-Driven Resource Allocation Framework for Microservices in Hybrid Cloud Platforms

Barua, Biman, Kaiser, M. Shamim

arXiv.org Artificial Intelligence

The increasing demand for scalable, efficient resource management in hybrid cloud environments has led to the exploration of AI-driven approaches for dynamic resource allocation. This paper presents an AI-driven framework for resource allocation among microservices in hybrid cloud platforms. The framework employs reinforcement learning (RL)-based resource utilization optimization to reduce costs and improve performance. The framework integrates AI models with cloud management tools to respond to challenges of dynamic scaling and cost-efficient low-latency service delivery. The reinforcement learning model continuously adjusts provisioned resources as required by the microservices and predicts the future consumption trends to minimize both under- and over-provisioning of resources. Preliminary simulation results indicate that using AI in the provision of resources related to costs can reduce expenditure by up to 30-40% compared to manual provisioning and threshold-based auto-scaling approaches. It is also estimated that the efficiency in resource utilization is expected to improve by 20%-30% with a corresponding latency cut of 15%-20% during the peak demand periods. This study compares the AI-driven approach with existing static and rule-based resource allocation methods, demonstrating the capability of this new model to outperform them in terms of flexibility and real-time interests. The results indicate that reinforcement learning can make optimization of hybrid cloud platforms even better, offering a 25-35% improvement in cost efficiency and the power of scaling for microservice-based applications. The proposed framework is a strong and scalable solution to managing cloud resources in dynamic and performance-critical environments.


Security of and by Generative AI platforms

Hayagreevan, Hari, Khamaru, Souvik

arXiv.org Artificial Intelligence

This whitepaper highlights the dual importance of securing generative AI (genAI) platforms and leveraging genAI for cybersecurity. As genAI technologies proliferate, their misuse poses significant risks, including data breaches, model tampering, and malicious content generation. Securing these platforms is critical to protect sensitive data, ensure model integrity, and prevent adversarial attacks. Simultaneously, genAI presents opportunities for enhancing security by automating threat detection, vulnerability analysis, and incident response. The whitepaper explores strategies for robust security frameworks around genAI systems, while also showcasing how genAI can empower organizations to anticipate, detect, and mitigate sophisticated cyber threats.


Augmenting train maintenance technicians with automated incident diagnostic suggestions

Tod, Georges, Bruggeman, Jean, Bevernage, Evert, Moelans, Pieter, Eeckhout, Walter, Glineur, Jean-Luc

arXiv.org Machine Learning

Train operational incidents are so far diagnosed individually and manually by train maintenance technicians. In order to assist maintenance crews in their responsiveness and task prioritization, a learning machine is developed and deployed in production to suggest diagnostics to train technicians on their phones, tablets or laptops as soon as a train incident is declared. A feedback loop allows to take into account the actual diagnose by designated train maintenance experts to refine the learning machine. By formulating the problem as a discrete set classification task, feature engineering methods are proposed to extract physically plausible sets of events from traces generated on-board railway vehicles. The latter feed an original ensemble classifier to class incidents by their potential technical cause. Finally, the resulting model is trained and validated using real operational data and deployed on a cloud platform. Future work will explore how the extracted sets of events can be used to avoid incidents by assisting human experts in the creation predictive maintenance alerts.


Google to add AI models to its cloud platform

The Japan Times

Alphabet's Google is adding artificial intelligence tools from companies including Meta Platforms and Anthropic to its cloud platform, weaving more generative AI into its products and positioning itself as a one-stop shop for cloud customers seeking to tap into the technology. Google's cloud clients will be able to access Meta's Llama 2 large language model, as well as AI startup Anthropic's Claude 2 chatbot, to customize with enterprise data for their own apps and services. The move, announced Tuesday at Google's Next '23 event in San Francisco, is part of the company's effort to position its platform as one where customers have the freedom to choose an AI model that best meets their needs, whether from the company itself or one of its partners. More than 100 powerful AI models and tools are now available to Google Cloud clients, the company said. The company also announced wider availability of its Duet AI product for customers of its Workspace productivity suite, with access for the public to follow later this year.