AITopics | host server

Collaborating Authors

host server

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

vSphere 8 Expands Machine Learning Support: Device Groups for NVIDIA GPUs and NICs

#artificialintelligenceSep-14-2022, 21:57:39 GMT

Data scientists and machine learning developers are building and training very large models these days with more extensive GPU memory needs. Many of these larger ML applications need more than one NVIDIA GPU device on the vSphere servers on which they operate or they may need to communicate between separate GPUs over the local network. This can be done for the purpose of expanding the overall GPU framebuffer memory capacity or for other reasons. Servers now exist on the market with eight or more physical GPUs in them and that number of GPUs per server will likely grow over time. With vSphere 8, you have the capability to add up to 8 virtual GPUs (vGPUs) to one VM.

device group, gpus, vsphere 8, (14 more...)

#artificialintelligence

Industry:

Information Technology > Hardware (0.63)
Information Technology > Software (0.40)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.91)

Add feedback

Performance at Scale: Graphcore's Latest MLPerf Training Results

#artificialintelligenceJul-28-2022, 12:12:50 GMT

Graphcore's latest submission to MLPerf demonstrates two things very clearly – our IPU systems are getting larger and more efficient, and our software maturity means they are also getting faster and easier to use. Software optimisation continues to deliver significant performance gains, with our IPU-POD16 now outperforming Nvidia's DGX A100 for computer vision model, ResNet-50. Training ResNet-50 takes 28.3 minutes on the IPU-POD16, compared to 29.1 minutes for DGX A100 – a performance improvement of 24% since our first submission through software alone. It is a significant milestone, given that ResNet-50 has traditionally been a showpiece model for GPUs. Our software-driven performance gain for ResNet-50 on the IPU-POD64 was even greater at 41%.

graphcore, host server, latest mlperf training result, (12 more...)

#artificialintelligence

Industry: Information Technology (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (0.73)
Information Technology > Artificial Intelligence > Natural Language (0.56)

Add feedback

How Virtual GPUs Enhance Sharing in Kubernetes for Machine Learning on VMware vSphere

#artificialintelligenceMar-12-2022, 06:15:34 GMT

This optimizes the use of the GPU hardware and it can serve more than one user, reducing costs. A basic level of familiarity with the core concepts in Kubernetes and in GPU Acceleration will be useful to the reader of this article. We first look more closely at pods in Kubernetes and how they relate to a GPU. A pod is the unit of deployment, at the lowest level, in Kubernetes. A pod can have one or more containers within it. The lifetime of the containers within a pod tend to be about the same, although one container may start before the others, as the "init" container. You can deploy higher-level objects like Kubernetes services and deployments that have many pods in them. We focus on pods and their use of GPUs in this article. Given access rights to a Tanzu Kubernetes cluster (TKC) running on the VMware vSphere with Tanzu environment (i.e. a set of host servers running the ESXi hypervisor, managed by VMware vCenter), a user can issue the command:

gpu, physical gpu, pod, (16 more...)

#artificialintelligence

Industry: Information Technology > Software (0.83)

Technology:

Information Technology > Virtualization (1.00)
Information Technology > Hardware (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Distributed Machine Learning on VMware vSphere with GPUs and Kubernetes: a Webinar - Virtualize Applications

#artificialintelligenceOct-12-2019, 22:49:21 GMT

This article directs you to a recent webinar that VMware produced on the topic of executing distributed machine learning with TensorFlow and Horovod running on a set of VMs on multiple vSphere host servers. Many machine learning problems are tackled using a single host server today (with a collection of VMs on that host). However, when your ML model or data grows too large for one host to handle, or your GPU power happens to be dispersed across several physical host servers/VMs, then distribution is the mechanism used to tackle that scenario. The VMware webinar introduces the concepts of machine learning in general first. It then gives a short description of Horovod for distributed training and explains the importance of low latency networking between the nodes in the distributed model, based here on Mellanox RDMA over Converged Ethernet (RoCE) technology.

gpus and kubernetes, virtualize application, vsphere, (13 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.37)

Industry: Information Technology > Software (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback