We produce a multitude of attributes (characteristics attached to an entity -- building, parcel, etc.) using various sources such as aerial imagery. The idea is to build Deep Learning models from a few thousand buildings using in-house-tagged labels or existing labels from open data. In a second step, the models are deployed on the whole French territory, which represents more than 35 million images to process (i.e. 4 TB of data to deal with). This second step is the focus of this post. The challenge is to be able to infer at low cost and in a short amount of time, (less than a day).
Containers have become the unit of deployment not just for data center and cloud workloads but also for edge applications. Along with containers, Kubernetes has become the foundation of the infrastructure. Distributions such as K3s are fueling the adoption of Kubernetes at the edge. I have seen many challenges when working with large retailers and system integrators rolling out Kubernetes-based edge infrastructure. One of them is the ability to mix and match ARM64 and AMD64 devices to run AI workloads.
If you are familiar with Kubeflow, you know KFServing as the platform's model server and inference engine. In September last year, the KFServing project has gone through a transformation to become KServe. KServe is now an independent component graduating from the Kubeflow project, apart from the name change. The separation allows KServe to evolve as a separate, cloud native inference engine deployed as a standalone model server. Of course, it will continue to have tight integration with Kubeflow, but they would be treated and maintained as independent open source projects.
PyTorch is an open source machine learning framework, primarily developed by Meta (previously Facebook). PyTorch is extensively used in the research space and in recent years it has gained immense traction in the industry due to its ease of use and deployment. Vertex AI, a fully managed end-to-end data science and machine learning platform on Google Cloud, has first class support for PyTorch making it optimized, compatibility tested and ready to deploy. We started a new blog series - PyTorch on Google Cloud - to uncover, demonstrate and share how to build, train and deploy PyTorch models at scale on Cloud AI Infrastructure using GPUs and TPUs on Vertex AI, and how to create reproducible machine learning pipelines on Google Cloud . This blog post is the home page to the series with links to the existing and upcoming posts for the readers to refer to.
Mobile edge computing (MEC) is considered a novel paradigm for computation-intensive and delay-sensitive tasks in fifth generation (5G) networks and beyond. However, its uncertainty, referred to as dynamic and randomness, from the mobile device, wireless channel, and edge network sides, results in high-dimensional, nonconvex, nonlinear, and NP-hard optimization problems. Thanks to the evolved reinforcement learning (RL), upon iteratively interacting with the dynamic and random environment, its trained agent can intelligently obtain the optimal policy in MEC. Furthermore, its evolved versions, such as deep RL (DRL), can achieve higher convergence speed efficiency and learning accuracy based on the parametric approximation for the large-scale state-action space. This paper provides a comprehensive research review on RL-enabled MEC and offers insight for development in this area. More importantly, associated with free mobility, dynamic channels, and distributed services, the MEC challenges that can be solved by different kinds of RL algorithms are identified, followed by how they can be solved by RL solutions in diverse mobile applications. Finally, the open challenges are discussed to provide helpful guidance for future research in RL training and learning MEC.
Artificial intelligence (AI), especially deep learning, requires vast amounts of data for training, testing, and validation. Collecting these data and the corresponding annotations requires the implementation of imaging biobanks that provide access to these data in a standardized way. This requires careful design and implementation based on the current standards and guidelines and complying with the current legal restrictions. However, the realization of proper imaging data collections is not sufficient to train, validate and deploy AI as resource demands are high and require a careful hybrid implementation of AI pipelines both on-premise and in the cloud. This chapter aims to help the reader when technical considerations have to be made about the AI environment by providing a technical background of different concepts and implementation aspects involved in data storage, cloud usage, and AI pipelines.
Video analytics pipelines have steadily shifted to edge deployments to reduce bandwidth overheads and privacy violations, but in doing so, face an ever-growing resource tension. Most notably, edge-box GPUs lack the memory needed to concurrently house the growing number of (increasingly complex) models for real-time inference. Unfortunately, existing solutions that rely on time/space sharing of GPU resources are insufficient as the required swapping delays result in unacceptable frame drops and accuracy violations. We present model merging, a new memory management technique that exploits architectural similarities between edge vision models by judiciously sharing their layers (including weights) to reduce workload memory costs and swapping delays. Our system, GEMEL, efficiently integrates merging into existing pipelines by (1) leveraging several guiding observations about per-model memory usage and inter-layer dependencies to quickly identify fruitful and accuracy-preserving merging configurations, and (2) altering edge inference schedules to maximize merging benefits. Experiments across diverse workloads reveal that GEMEL reduces memory usage by up to 60.7%, and improves overall accuracy by 8-39% relative to time/space sharing alone.
Mobile edge computing has become an effective and fundamental paradigm for futuristic autonomous vehicles to offload computing tasks. However, due to the high mobility of vehicles, the dynamics of the wireless conditions, and the uncertainty of the arrival computing tasks, it is difficult for a single vehicle to determine the optimal offloading strategy. In this paper, we propose a Digital Twin (DT) empowered task offloading framework for Internet of Vehicles. As a software agent residing in the cloud, a DT can obtain both global network information by using communications among DTs, and historical information of a vehicle by using the communications within the twin. The global network information and historical vehicular information can significantly facilitate the offloading. In specific, to preserve the precious computing resource at different levels for most appropriate computing tasks, we integrate a learning scheme based on the prediction of futuristic computing tasks in DT. Accordingly, we model the offloading scheduling process as a Markov Decision Process (MDP) to minimize the long-term cost in terms of a trade off between task latency, energy consumption, and renting cost of clouds. Simulation results demonstrate that our algorithm can effectively find the optimal offloading strategy, as well as achieve the fast convergence speed and high performance, compared with other existing approaches.
The demand for AI applications, the ever-growing nature of deep learning models and their increasing complexity mean there is plenty of room for competition when it comes to making computer chips more powerful and efficient for such workloads. GPU juggernaut Nvidia may hold the AI chip crown in multiple respects, but that isn't stopping semiconductor companies both large and small from designing their own AI chip architectures that offer differentiation in terms of features, performance and targeted applications. What follows are the 10 coolest AI chips of 2021, which includes processors from semiconductor giants Intel, AMD and Nvidia, computing juggernaut IBM, cloud service providers Google Cloud and Amazon Web Services and AI chip startups Cerebras Systems, Mythic and Syntiant.