Collaborating Authors


How to Use Google Cloud and GPU Build Simple Deep Learning Environment


Google Cloud Platform provides us with a wealth of resources to support data science, deep learning, and AI projects. Now all we need to care about is how to design and train models, and the platform manages the rest tasks. In current pandemic environment, the entire process of an AI project from design, coding to deployment, can be done remotely on the Cloud Platform. IMPORTANT: If you get the following notification when you create a VM that contains GPUs. You need to increase your GPU quota.

Beginners Guide to Cloud Computing


Imagine you would like to train a deep learning model where you have thousands of images, but your system does not have any GPU. It would be hard to train large training models without GPU, so you will generally use google collab to train your model using google's GPU's. Consider your system memory is full, and you have important documents and videos to be stored and should be secured. Google drive can be one solution to store all your files, including documents, images, and videos up to 15GB, and offers security and back-up. Above mentioned scenarios are some of the applications of Cloud Computing, one of the advantages of using cloud computing is that you only pay for what we use.

NTT develops distributed deep learning for edge computing


"Our research is investigating a training algorithm to obtain a global model as if it is trained by aggregating data in a single server, even when the data are placed in distributed servers, such as in edge computing," according to the statement. NTT's proposed technology has enabled developers to successfully train a global model in early experiments-even in cases where different types of data are used and the communication between servers is "asynchronous," meaning that each compute node's results are not dependent on receiving data and results from another node. NTT notes that interest in edge computing is growing because of the benefits for lower application latency, and expects that there will be community interest in the application of its research to edge compute and networking services. The company said it will continue to develop the technology for commercial applications, and will release the source code to promote collaboration.

Develop, Train and Deploy TensorFlow Models using Google Cloud AI Platform


The TensorFlow ecosystem has become very popular for developing applications involving deep learning. One of the reasons is that it has a strong community and a lot of tools have been developed around the core library to support developers. In this tutorial, I will guide you through how to prototype models in google colab, train it on Google Cloud AI Platform, and deploy the finalized model on Google Cloud AI Platform for production. I will include the working Google colab notebooks to recreate the work. Google colab is a free resource for prototyping models in TensorFlow and comes with various runtime.

A Review on Computational Intelligence Techniques in Cloud and Edge Computing Artificial Intelligence

Cloud computing (CC) is a centralized computing paradigm that accumulates resources centrally and provides these resources to users through Internet. Although CC holds a large number of resources, it may not be acceptable by real-time mobile applications, as it is usually far away from users geographically. On the other hand, edge computing (EC), which distributes resources to the network edge, enjoys increasing popularity in the applications with low-latency and high-reliability requirements. EC provides resources in a decentralized manner, which can respond to users' requirements faster than the normal CC, but with limited computing capacities. As both CC and EC are resource-sensitive, several big issues arise, such as how to conduct job scheduling, resource allocation, and task offloading, which significantly influence the performance of the whole system. To tackle these issues, many optimization problems have been formulated. These optimization problems usually have complex properties, such as non-convexity and NP-hardness, which may not be addressed by the traditional convex optimization-based solutions. Computational intelligence (CI), consisting of a set of nature-inspired computational approaches, recently exhibits great potential in addressing these optimization problems in CC and EC. This paper provides an overview of research problems in CC and EC and recent progresses in addressing them with the help of CI techniques. Informative discussions and future research trends are also presented, with the aim of offering insights to the readers and motivating new research directions.

A Survey on Edge Intelligence Artificial Intelligence

Edge intelligence refers to a set of connected systems and devices for data collection, caching, processing, and analysis in locations close to where data is captured based on artificial intelligence. The aim of edge intelligence is to enhance the quality and speed of data processing and protect the privacy and security of the data. Although recently emerged, spanning the period from 2011 to now, this field of research has shown explosive growth over the past five years. In this paper, we present a thorough and comprehensive survey on the literature surrounding edge intelligence. We first identify four fundamental components of edge intelligence, namely edge caching, edge training, edge inference, and edge offloading, based on theoretical and practical results pertaining to proposed and deployed systems. We then aim for a systematic classification of the state of the solutions by examining research results and observations for each of the four components and present a taxonomy that includes practical problems, adopted techniques, and application goals. For each category, we elaborate, compare and analyse the literature from the perspectives of adopted techniques, objectives, performance, advantages and drawbacks, etc. This survey article provides a comprehensive introduction to edge intelligence and its application areas. In addition, we summarise the development of the emerging research field and the current state-of-the-art and discuss the important open issues and possible theoretical and technical solutions.

Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud Machine Learning

FPGAs have shown great potential in providing low-latency and energy-efficient solutions for deep neural network (DNN) inference applications. Currently, the majority of FPGA-based DNN accelerators in the cloud run in a time-division multiplexing way for multiple users sharing a single FPGA, and require re-compilation with $\sim$100 s overhead. Such designs lead to poor isolation and heavy performance loss for multiple users, which are far away from providing efficient and flexible FPGA virtualization for neither public nor private cloud scenarios. To solve these problems, we introduce a novel virtualization framework for instruction architecture set (ISA) based on DNN accelerators by sharing a single FPGA. We enable the isolation by introducing a two-level instruction dispatch module and a multi-core based hardware resources pool. Such designs provide isolated and runtime-programmable hardware resources, further leading to performance isolation for multiple users. On the other hand, to overcome the heavy re-compilation overheads, we propose a tiling-based instruction frame package design and two-stage static-dynamic compilation. Only the light-weight runtime information is re-compiled with $\sim$1 ms overhead, thus the performance is guaranteed for the private cloud. Our extensive experimental results show that the proposed virtualization design achieves 1.07-1.69x and 1.88-3.12x throughput improvement over previous static designs using the single-core and the multi-core architectures, respectively.

Kubernetes Gets an Automated ML Workflow


A stable version of an automation tool released this week aims to make life easier machine learning developers training and scaling models, then deploying ML workloads atop Kubernetes clusters. Roughly two years after its open source release, Kubeflow 1.0 leverages the de facto standard cluster orchestrator to aid data scientists and ML developers in tapping cloud resources to run those workloads in production. Among the stable workflow applications released on Monday (March 2) are a central dashboard, Jupyter notebook controller and web application along with TensorFlow and PyTorch operators for distributed training. Contributors from Google, IBM, Cisco Systems, Microsoft and data management specialist Arrikto said Jupyter notebooks can be used to streamline model development. Other tools can then be used to build application containers and leverage Kubernetes resources to train models.

Analyzing CNN Based Behavioural Malware Detection Techniques on Cloud IaaS Machine Learning

Cloud Infrastructure as a Service (IaaS) is vulnerable to malware due to its exposure to external adversaries, making it a lucrative attack vector for malicious actors. A datacenter infected with malware can cause data loss and/or major disruptions to service for its users. This paper analyzes and compares various Convolutional Neural Networks (CNNs) for online detection of malware in cloud IaaS. The detection is performed based on behavioural data using process level performance metrics including cpu usage, memory usage, disk usage etc. We have used the state of the art DenseNets and ResNets in effectively detecting malware in online cloud system. CNN are designed to extract features from data gathered from a live malware running on a real cloud environment. Experiments are performed on OpenStack (a cloud IaaS software) testbed designed to replicate a typical 3-tier web architecture. Comparative analysis is performed for different metrics for different CNN models used in this research.

Reinforcement Learning-based Autoscaling of Workflows in the Cloud: A Survey Machine Learning

Reinforcement Learning (RL) has demonstrated a great potential for automatically solving decision making problems in complex uncertain environments. Basically, RL proposes a computational approach that allows learning through interaction in an environment of stochastic behavior, with agents taking actions to maximize some cumulative short-term and long-term rewards. Some of the most impressive results have been shown in Game Theory where agents exhibited super-human performance in games like Go or Starcraft 2, which led to its adoption in many other domains including Cloud Computing. Particularly, workflow autoscaling exploits the Cloud elasticity to optimize the execution of workflows according to a given optimization criteria. This is a decision-making problem in which it is necessary to establish when and how to scale-up/down computational resources; and how to assign them to the upcoming processing workload. Such actions have to be taken considering some optimization criteria in the Cloud, a dynamic and uncertain environment. Motivated by this, many works apply RL to the autoscaling problem in Cloud. In this work we survey exhaustively those proposals from major venues, and uniformly compare them based on a set of proposed taxonomies. We also discuss open problems and provide a prospective of future research in the area.