serverless function
FLStore: Efficient Federated Learning Storage for non-training workloads
Khan, Ahmad Faraz, Fountain, Samuel, Abdelmoniem, Ahmed M., Butt, Ali R., Anwar, Ali
Federated Learning (FL) is an approach for privacy-preserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection. With an aggregator server coordinating training, aggregating model updates, and storing metadata across rounds. In addition to training, a substantial part of FL systems are the non-training workloads such as scheduling, personalization, clustering, debugging, and incentivization. Most existing systems rely on the aggregator to handle non-training workloads and use cloud services for data storage. This results in high latency and increased costs as non-training workloads rely on large volumes of metadata, including weight parameters from client updates, hyperparameters, and aggregated updates across rounds, making the situation even worse. We propose FLStore, a serverless framework for efficient FL non-training workloads and storage. FLStore unifies the data and compute planes on a serverless cache, enabling locality-aware execution via tailored caching policies to reduce latency and costs. Per our evaluations, compared to cloud object store based aggregator server FLStore reduces per request average latency by 71% and costs by 92.45%, with peak improvements of 99.7% and 98.8%, respectively. Compared to an in-memory cloud cache based aggregator server, FLStore reduces average latency by 64.6% and costs by 98.83%, with peak improvements of 98.8% and 99.6%, respectively. FLStore integrates seamlessly with existing FL frameworks with minimal modifications, while also being fault-tolerant and highly scalable.
Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions
As data-intensive applications grow, batch processing in limited-resource environments faces scalability and resource management challenges. Serverless computing offers a flexible alternative, enabling dynamic resource allocation and automatic scaling. This paper explores how serverless architectures can make large-scale ML inference tasks faster and cost-effective by decomposing monolithic processes into parallel functions. Through a case study on sentiment analysis using the DistilBERT model and the IMDb dataset, we demonstrate that serverless parallel processing can reduce execution time by over 95% compared to monolithic approaches, at the same cost.
Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing
Liu, Mengfan, Wang, Wei, Wu, Chuan
With the advancement of serverless computing, running machine learning (ML) inference services over a serverless platform has been advocated, given its labor-free scalability and cost effectiveness. Mixture-of-Experts (MoE) models have been a dominant type of model architectures to enable large models nowadays, with parallel expert networks. Serving large MoE models on serverless computing is potentially beneficial, but has been underexplored due to substantial challenges in handling the skewed expert popularity and scatter-gather communication bottleneck in MoE model execution, for cost-efficient serverless MoE deployment and performance guarantee. We study optimized MoE model deployment and distributed inference serving on a serverless platform, that effectively predict expert selection, pipeline communication with model execution, and minimize the overall billed cost of serving MoE models. Especially, we propose a Bayesian optimization framework with multi-dimensional epsilon-greedy search to learn expert selections and optimal MoE deployment achieving optimal billed cost, including: 1) a Bayesian decision-making method for predicting expert popularity; 2) flexibly pipelined scatter-gather communication; and 3) an optimal model deployment algorithm for distributed MoE serving. Extensive experiments on AWS Lambda show that our designs reduce the billed cost of all MoE layers by at least 75.67% compared to CPU clusters while maintaining satisfactory inference throughput. As compared to LambdaML in serverless computing, our designs achieves 43.41% lower cost with a throughput decrease of at most 18.76%.
Input-Based Ensemble-Learning Method for Dynamic Memory Configuration of Serverless Computing Functions
Agarwal, Siddharth, Rodriguez, Maria A., Buyya, Rajkumar
In today's Function-as-a-Service offerings, a programmer is usually responsible for configuring function memory for its successful execution, which allocates proportional function resources such as CPU and network. However, right-sizing the function memory force developers to speculate performance and make ad-hoc configuration decisions. Recent research has highlighted that a function's input characteristics, such as input size, type and number of inputs, significantly impact its resource demand, run-time performance and costs with fluctuating workloads. This correlation further makes memory configuration a non-trivial task. On that account, an input-aware function memory allocator not only improves developer productivity by completely hiding resource-related decisions but also drives an opportunity to reduce resource wastage and offer a finer-grained cost-optimised pricing scheme. Therefore, we present MemFigLess, a serverless solution that estimates the memory requirement of a serverless function with input-awareness. The framework executes function profiling in an offline stage and trains a multi-output Random Forest Regression model on the collected metrics to invoke input-aware optimal configurations. We evaluate our work with the state-of-the-art approaches on AWS Lambda service to find that MemFigLess is able to capture the input-aware resource relationships and allocate upto 82% less resources and save up to 87% run-time costs.
Detection of Compromised Functions in a Serverless Cloud Environment
Lavi, Danielle, Brodt, Oleg, Mimran, Dudu, Elovici, Yuval, Shabtai, Asaf
Serverless computing is an emerging cloud paradigm with serverless functions at its core. While serverless environments enable software developers to focus on developing applications without the need to actively manage the underlying runtime infrastructure, they open the door to a wide variety of security threats that can be challenging to mitigate with existing methods. Existing security solutions do not apply to all serverless architectures, since they require significant modifications to the serverless infrastructure or rely on third-party services for the collection of more detailed data. In this paper, we present an extendable serverless security threat detection model that leverages cloud providers' native monitoring tools to detect anomalous behavior in serverless applications. Our model aims to detect compromised serverless functions by identifying post-exploitation abnormal behavior related to different types of attacks on serverless functions, and therefore, it is a last line of defense. Our approach is not tied to any specific serverless application, is agnostic to the type of threats, and is adaptable through model adjustments. To evaluate our model's performance, we developed a serverless cybersecurity testbed in an AWS cloud environment, which includes two different serverless applications and simulates a variety of attack scenarios that cover the main security threats faced by serverless functions. Our evaluation demonstrates our model's ability to detect all implemented attacks while maintaining a negligible false alarm rate.
Bi-directional personalization reinforcement learning-based architecture with active learning using a multi-model data service for the travel nursing industry
The challenges of using inadequate online recruitment systems can be addressed with machine learning and software engineering techniques. Bi-directional personalization reinforcement learning-based architecture with active learning can get recruiters to recommend qualified applicants and also enable applicants to receive personalized job recommendations. This paper focuses on how machine learning techniques can enhance the recruitment process in the travel nursing industry by helping speed up data acquisition using a multi-model data service and then providing personalized recommendations using bi-directional reinforcement learning with active learning. This need was especially evident when trying to respond to the overwhelming needs of healthcare facilities during the COVID-19 pandemic. The need for traveling nurses and other healthcare professionals was more evident during the lockdown period. A data service was architected for job feed processing using an orchestration of natural language processing (NLP) models that synthesize job-related data into a database efficiently and accurately. The multi-model data service provided the data necessary to develop a bi-directional personalization system using reinforcement learning with active learning that could recommend travel nurses and healthcare professionals to recruiters and provide job recommendations to applicants using an internally developed smart match score as a basis. The bi-directional personalization reinforcement learning-based architecture with active learning combines two personalization systems - one that runs forward to recommend qualified candidates for jobs and another that runs backward and recommends jobs for applicants.
DeF-DReL: Systematic Deployment of Serverless Functions in Fog and Cloud environments using Deep Reinforcement Learning
Dehury, Chinmaya Kumar, Poojara, Shivananda, Domanal, Shridhar, Srirama, Satish Narayana
Fog computing is introduced by shifting cloud resources towards the users' proximity to mitigate the limitations possessed by cloud computing. Fog environment made its limited resource available to a large number of users to deploy their serverless applications, composed of several serverless functions. One of the primary intentions behind introducing the fog environment is to fulfil the demand of latency and location-sensitive serverless applications through its limited resources. The recent research mainly focuses on assigning maximum resources to such applications from the fog node and not taking full advantage of the cloud environment. This introduces a negative impact in providing the resources to a maximum number of connected users. To address this issue, in this paper, we investigated the optimum percentage of a user's request that should be fulfilled by fog and cloud. As a result, we proposed DeF-DReL, a Systematic Deployment of Serverless Functions in Fog and Cloud environments using Deep Reinforcement Learning, using several real-life parameters, such as distance and latency of the users from nearby fog node, user's priority, the priority of the serverless applications and their resource demand, etc. The performance of the DeF-DReL algorithm is further compared with recent related algorithms. From the simulation and comparison results, its superiority over other algorithms and its applicability to the real-life scenario can be clearly observed.
GPU-as-a-Service on KubeFlow: Fast, Scalable and Efficient ML
Machine Learning (ML) and Deep Learning (DL) involve compute and data intensive tasks. In order to maximize our model accuracy, we want to train on larger datasets, evaluate a variety of algorithms, and try out different parameters for each algorithm (hyper-parameter tuning). As our datasets and model complexity grow, so does the time we need to wait for our jobs to complete, leading to inefficient use of our time. We end up running fewer iterations and tests or working on smaller datasets as a result. NVIDIA GPUs are a great tool to accelerate our data science work.
The Rise of Serverless Computing
Cloud computing in general, and Infrastructure-as-a-Service (IaaS) in particular, have become widely accepted and adopted paradigms for computing with the offerings of virtual machines (VM) on demand. By 2020, 67% of enterprise IT infrastructure and software spending will be for cloud-based offerings.16 A major factor in the increased adoption of the cloud by enterprise IT was its pay-as-you-go model where a customer pays only for resources leased from the cloud provider and have the ability to get as many resources as needed with no up-front cost (elasticity).2 Unfortunately, the burden of scaling was left for developers and system designers that typically used overprovisioning techniques to handle sudden surges in service requests. Studies of reported usage of cloud resources in datacenters19 show a substantial gap between the resources that cloud customers allocate and pay for (leasing VMs), and actual resource utilization (CPU, memory, and so on). Serverless computing is emerging as a new and compelling paradigm for the deployment of cloud applications, largely due to the recent shift of enterprise application architectures to containers and microservices.23 Using serverless gives pay-as-you-go without additional work to start and stop server and is closer to original expectations for cloud computing to be treated like as a utility.2 Developers using serverless computing can get cost savings and scalability without needing to havea high level of cloud computing expertise that is time-consuming to acquire. Due to its simplicity and economical advantages, serverless computing is gaining popularity as reported by the increasing rate of the "serverless" search term by Google Trends. Its market size is estimated to grow to 7.72 billion by 2021.10 Most prominent cloud providers including Amazon, IBM, Microsoft, Google, and others have already released serverless computing capabilities with several additional open source efforts driven by both industry and academic institutions (for example, see CNCF Serverless Cloud Native Landscapea).
Continuous Machine Learning Deployment with Serverless, AWS and Snowflake - WebSystemer.no
Anyone who has built a machine learning model will know the feeling… "How do I get my masterpiece out of this python notebook and in front of the world?". Answering this question is rarely simple and with a multitude of different options to consider, this can be a huge source of both technical debt for data science teams and dependency on engineering resource. At HeadBox we have developed a lean deployment pipeline for simple machine learning models that are used in our venue recommendation engines. Here I will demonstrate the deployment of a simple classification model using three Serverless lambda functions, pulling data from a data warehouse such as Snowflake, posting results to S3 buckets and DynamoDB tables, as well as posting daily performance updates to slack. Our first Serverless function will be used to pull training data from Snowflake, perform feature engineering and train a simple decision tree model.