Goto

Collaborating Authors

Results


How Virtual GPUs Enhance Sharing in Kubernetes for Machine Learning on VMware vSphere

#artificialintelligence

This optimizes the use of the GPU hardware and it can serve more than one user, reducing costs. A basic level of familiarity with the core concepts in Kubernetes and in GPU Acceleration will be useful to the reader of this article. We first look more closely at pods in Kubernetes and how they relate to a GPU. A pod is the unit of deployment, at the lowest level, in Kubernetes. A pod can have one or more containers within it. The lifetime of the containers within a pod tend to be about the same, although one container may start before the others, as the "init" container. You can deploy higher-level objects like Kubernetes services and deployments that have many pods in them. We focus on pods and their use of GPUs in this article. Given access rights to a Tanzu Kubernetes cluster (TKC) running on the VMware vSphere with Tanzu environment (i.e. a set of host servers running the ESXi hypervisor, managed by VMware vCenter), a user can issue the command:


Exploring the Impact of Virtualization on the Usability of the Deep Learning Applications

arXiv.org Artificial Intelligence

Deep Learning-based (DL) applications are becoming increasingly popular and advancing at an unprecedented pace. While many research works are being undertaken to enhance Deep Neural Networks (DNN) -- the centerpiece of DL applications -- practical deployment challenges of these applications in the Cloud and Edge systems, and their impact on the usability of the applications have not been sufficiently investigated. In particular, the impact of deploying different virtualization platforms, offered by the Cloud and Edge, on the usability of DL applications (in terms of the End-to-End (E2E) inference time) has remained an open question. Importantly, resource elasticity (by means of scale-up), CPU pinning, and processor type (CPU vs GPU) configurations have shown to be influential on the virtualization overhead. Accordingly, the goal of this research is to study the impact of these potentially decisive deployment options on the E2E performance, thus, usability of the DL applications. To that end, we measure the impact of four popular execution platforms (namely, bare-metal, virtual machine (VM), container, and container in VM) on the E2E inference time of four types of DL applications, upon changing processor configuration (scale-up, CPU pinning) and processor types. This study reveals a set of interesting and sometimes counter-intuitive findings that can be used as best practices by Cloud solution architects to efficiently deploy DL applications in various systems. The notable finding is that the solution architects must be aware of the DL application characteristics, particularly, their pre- and post-processing requirements, to be able to optimally choose and configure an execution platform, determine the use of GPU, and decide the efficient scale-up range.


Nvidia and VMware team up to help enterprises scale up AI development

#artificialintelligence

Enterprises can begin to run trials of their AI projects using VMware vSphere with Tanzu together with Nvidia AI Enterprise software suite, as part of moves by both companies to further simplify AI development and application management. By extending testing to vSphere with Tanzu, Nvidia boasts it will enable developers to run AI workloads on Kubernetes containers within their existing VMware environments. The software suite will run on mainstream Nvidia-certified systems, the company said, noting it would provide a complete software and hardware stack suitable for AI development. "Nvidia has gone and invested in building all of the next-generation cloud application-level components, where you can now take the NGC libraries, which are container-based, and run those in a Kubernetes orchestrated VMware environment, so you're getting the ability now to go and bridge the world of developers and infrastructure," VMware cloud infrastructure business group marketing VP Lee Caswell told media. The move comes off the back of VMware announcing Nvidia AI Enterprise in March.


Nvidia and VMware team up to help enterprises scale up AI development

ZDNet

Enterprises can begin to run trials of their AI projects using VMware vSphere with Tanzu together with Nvidia AI Enterprises software suite, as part of moves by both companies to further simplify AI development and application management. By extending testing to vSphere with Tanzu, Nvidia boasts it will enable developers to run AI workloads on Kubernetes containers within their existing VMware environments. The software suite will run on mainstream Nvidia-certified systems, the company said, noting it would provide a complete software and hardware stack suitable for AI development. "Nvidia has gone and invested in building all of the next-generation cloud application-level components, where you can now take the NGC libraries, which are container-based, and run those in a Kubernetes orchestrated VMware environment, so you're getting the ability now to go and bridge the world of developers and infrastructure," VMware cloud infrastructure business group marketing VP Lee Caswell told media. The move comes off the back of VMware announcing Nvidia AI Enterprise in March.


The Top 100 Software Companies of 2021

#artificialintelligence

The Software Report is pleased to announce The Top 100 Software Companies of 2021. This year's awardee list is comprised of a wide range of companies from the most well-known such as Microsoft, Adobe, and Salesforce to the relatively newer but rapidly growing - Qualtrics, Atlassian, and Asana. A good number of awardees may be new names to some but that should be no surprise given software has always been an industry of startups that seemingly came out of nowhere to create and dominate a new space. Software has become the backbone of our economy. From large enterprises to small businesses, most all rely on software whether for accounting, marketing, sales, supply chain, or a myriad of other functions. Software has become the dominant industry of our time and as such, we place a significance on highlighting the best companies leading the industry forward. The following awardees were nominated and selected based on a thorough evaluation process. Among the key criteria considered were ...


IBM and AMD Begin Cooperation on Cybersecurity and AI

#artificialintelligence

International Business Machines (IBM) - Get Report and Advanced Micro Devices (AMD) - Get Report said they began a development program focused on cybersecurity and artificial intelligence. The development agreement will build on "open-source software, open standards, and open system architectures to drive confidential computing in hybrid cloud environments," the companies said in a statement. The agreement also will "support a broad range of accelerators across high-performance computing and enterprise critical capabilities, such as virtualization and encryption," they said. AMD, Santa Clara, Calif., is one of the world's biggest chipmakers and is thriving. IBM, the storied Armonk, N.Y., technology services company, has struggled to regain the glory of its past, when it led the computer-making industry.


Dell Technologies rolls out systems for HPC, AI workloads leveraging VMware's Bitfusion

ZDNet

The AI and ML deployments are well underway, but for CXOs the biggest issue will be managing these initiatives, and figuring out where the data science team fits in and what algorithms to buy versus build. Dell Technologies is rolling out a series of designs and systems that aim to speed up artificial intelligence deployments by using VMware's acquired Bitfusion technology. Two Dell EMC Ready Solutions are based on VMware Validated Designs to combine Dell EMC hardware with VMware Cloud Foundation and AI management Bitfusion tools in VMware vSphere 7. Dell Technologies said that its Dell Dell Technologies is claiming to be among the first IT companies to equip systems to run AI workloads within VMware environments. Ravi Pendekanti, senior vice president of product management and marketing for Dell Technologies server unit, said the new systems are designed to run AI anywhere and take advantage of underutilized GPUs. "GPU instances are being underutilized and that is holding back AI," said Pendekanti.


Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud

arXiv.org Machine Learning

FPGAs have shown great potential in providing low-latency and energy-efficient solutions for deep neural network (DNN) inference applications. Currently, the majority of FPGA-based DNN accelerators in the cloud run in a time-division multiplexing way for multiple users sharing a single FPGA, and require re-compilation with $\sim$100 s overhead. Such designs lead to poor isolation and heavy performance loss for multiple users, which are far away from providing efficient and flexible FPGA virtualization for neither public nor private cloud scenarios. To solve these problems, we introduce a novel virtualization framework for instruction architecture set (ISA) based on DNN accelerators by sharing a single FPGA. We enable the isolation by introducing a two-level instruction dispatch module and a multi-core based hardware resources pool. Such designs provide isolated and runtime-programmable hardware resources, further leading to performance isolation for multiple users. On the other hand, to overcome the heavy re-compilation overheads, we propose a tiling-based instruction frame package design and two-stage static-dynamic compilation. Only the light-weight runtime information is re-compiled with $\sim$1 ms overhead, thus the performance is guaranteed for the private cloud. Our extensive experimental results show that the proposed virtualization design achieves 1.07-1.69x and 1.88-3.12x throughput improvement over previous static designs using the single-core and the multi-core architectures, respectively.


A Survey on the Use of Preferences for Virtual Machine Placement in Cloud Data Centers

arXiv.org Artificial Intelligence

With the rapid development of virtualization techniques, cloud data centers allow for cost effective, flexible, and customizable deployments of applications on virtualized infrastructure. Virtual machine (VM) placement aims to assign each virtual machine to a server in the cloud environment. VM Placement is of paramount importance to the design of cloud data centers. Typically, VM placement involves complex relations and multiple design factors as well as local policies that govern the assignment decisions. It also involves different constituents including cloud administrators and customers that might have disparate preferences while opting for a placement solution. Thus, it is often valuable to not only return an optimized solution to the VM placement problem but also a solution that reflects the given preferences of the constituents. In this paper, we provide a detailed review on the role of preferences in the recent literature on VM placement. We further discuss key challenges and identify possible research opportunities to better incorporate preferences within the context of VM placement.


Accelerating AI With GPU Virtualization In The Cloud

#artificialintelligence

In July, VMware acquired Bitfusion, a company whose technology virtualizes compute accelerators with the goal of enabling modern workloads like artificial intelligence and data analytics to take full advantage of systems with GPUs or with FPGAs. Specifically, Bitfusion's software allows for virtual machines to offload compute duties to GPUs, FPGAs, or even other kinds of ASICs. The deal didn't get a ton of attention at the time, but for VMware, it was an important step in realizing its cloud ambitions. "Hardware acceleration for applications delivers efficiency and flexibility into the AI space, including subsets such as machine learning," Krish Prasad, senior vice president and general manager of VMware's Cloud Platform business unit, wrote in a blog post announcing the acquisition. "Unfortunately, hardware accelerators today are deployed with bare-metal practices, which force poor utilization, poor efficiencies, and limit organizations from sharing, abstracting and automating the infrastructure. This provides a perfect opportunity to virtualize them – providing increased sharing of resources and lowering costs."