Collaborating Authors


Training a recommender model of 100 trillions parameters on Google Cloud


A recommender system is an important component of Internet services today: billion dollar revenue businesses are directly driven by recommendation services at big tech companies. The current landscape of production recommender systems is dominated by deep learning based approaches, where an embedding layer is first adopted to map extremely large-scale ID type features to fixed-length embedding vectors; then the embeddings are leveraged by complicated neural network architectures to generate recommendations. The continuing advancement of recommender models is often driven by increasing model sizes--several models have been previously released with billion parameters up to even trillion very recently. Every jump in the model capacity has brought in significant improvement on quality. The era of 100 trillion parameters is just around the corner.

Deploy machine learning models on Google Cloud AI Platform


My Course is meant for anyone who already knows how to build both machine and deep learning models that is interested in deploying them easily on Google Cloud AI Platform. So that they can send the deployed models post requests. Also you must be familiar with Natural Language Processing and some basic cloud concepts. I will explain everything in the videos. But most importantly you do not need to be an expert in python to do this.

TensorFlow and the Google Cloud ML Engine for Deep Learning


TensorFlow is quickly becoming the technology of choice for deep learning, because of how easy TF makes it to build powerful and sophisticated neural networks. The Google Cloud Platform is a great place to run TF models at scale, and perform distributed training and prediction. This is a comprehensive, from-the-basics course on TensorFlow and building neural networks. It assumes no prior knowledge of Tensorflow, all you need to know is basic Python programming.

PyTorch on Google Cloud: Blog series recap


PyTorch is an open source machine learning framework, primarily developed by Meta (previously Facebook). PyTorch is extensively used in the research space and in recent years it has gained immense traction in the industry due to its ease of use and deployment. Vertex AI, a fully managed end-to-end data science and machine learning platform on Google Cloud, has first class support for PyTorch making it optimized, compatibility tested and ready to deploy. We started a new blog series - PyTorch on Google Cloud - to uncover, demonstrate and share how to build, train and deploy PyTorch models at scale on Cloud AI Infrastructure using GPUs and TPUs on Vertex AI, and how to create reproducible machine learning pipelines on Google Cloud . This blog post is the home page to the series with links to the existing and upcoming posts for the readers to refer to.

GitHub - deepmind/xmanager


XManager is a platform for packaging, running and keeping track of machine learning experiments. It currently enables one to launch experiments locally or on Google Cloud Platform (GCP). Interaction with experiments is done via XManager's APIs through Python launch scripts. To get started, install XManager, its prerequisites if needed and follow the tutorial or codelab.ipynb to create and run a launch script. Or, alternatively, a PyPI project is also available.

On-Device Learning with Cloud-Coordinated Data Augmentation for Extreme Model Personalization in Recommender Systems Artificial Intelligence

Data heterogeneity is an intrinsic property of recommender systems, making models trained over the global data on the cloud, which is the mainstream in industry, non-optimal to each individual user's local data distribution. To deal with data heterogeneity, model personalization with on-device learning is a potential solution. However, on-device training using a user's small size of local samples will incur severe overfitting and undermine the model's generalization ability. In this work, we propose a new device-cloud collaborative learning framework, called CoDA, to break the dilemmas of purely cloud-based learning and on-device learning. The key principle of CoDA is to retrieve similar samples from the cloud's global pool to augment each user's local dataset to train the recommendation model. Specifically, after a coarse-grained sample matching on the cloud, a personalized sample classifier is further trained on each device for a fine-grained sample filtering, which can learn the boundary between the local data distribution and the outside data distribution. We also build an end-to-end pipeline to support the flows of data, model, computation, and control between the cloud and each device. We have deployed CoDA in a recommendation scenario of Mobile Taobao. Online A/B testing results show the remarkable performance improvement of CoDA over both cloud-based learning without model personalization and on-device training without data augmentation. Overhead testing on a real device demonstrates the computation, storage, and communication efficiency of the on-device tasks in CoDA.

AI Technical Considerations: Data Storage, Cloud usage and AI Pipeline Artificial Intelligence

Artificial intelligence (AI), especially deep learning, requires vast amounts of data for training, testing, and validation. Collecting these data and the corresponding annotations requires the implementation of imaging biobanks that provide access to these data in a standardized way. This requires careful design and implementation based on the current standards and guidelines and complying with the current legal restrictions. However, the realization of proper imaging data collections is not sufficient to train, validate and deploy AI as resource demands are high and require a careful hybrid implementation of AI pipelines both on-premise and in the cloud. This chapter aims to help the reader when technical considerations have to be made about the AI environment by providing a technical background of different concepts and implementation aspects involved in data storage, cloud usage, and AI pipelines.

GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge Artificial Intelligence

Video analytics pipelines have steadily shifted to edge deployments to reduce bandwidth overheads and privacy violations, but in doing so, face an ever-growing resource tension. Most notably, edge-box GPUs lack the memory needed to concurrently house the growing number of (increasingly complex) models for real-time inference. Unfortunately, existing solutions that rely on time/space sharing of GPU resources are insufficient as the required swapping delays result in unacceptable frame drops and accuracy violations. We present model merging, a new memory management technique that exploits architectural similarities between edge vision models by judiciously sharing their layers (including weights) to reduce workload memory costs and swapping delays. Our system, GEMEL, efficiently integrates merging into existing pipelines by (1) leveraging several guiding observations about per-model memory usage and inter-layer dependencies to quickly identify fruitful and accuracy-preserving merging configurations, and (2) altering edge inference schedules to maximize merging benefits. Experiments across diverse workloads reveal that GEMEL reduces memory usage by up to 60.7%, and improves overall accuracy by 8-39% relative to time/space sharing alone.

The 10 Coolest AI Chips Of 2021


The demand for AI applications, the ever-growing nature of deep learning models and their increasing complexity mean there is plenty of room for competition when it comes to making computer chips more powerful and efficient for such workloads. GPU juggernaut Nvidia may hold the AI chip crown in multiple respects, but that isn't stopping semiconductor companies both large and small from designing their own AI chip architectures that offer differentiation in terms of features, performance and targeted applications. What follows are the 10 coolest AI chips of 2021, which includes processors from semiconductor giants Intel, AMD and Nvidia, computing juggernaut IBM, cloud service providers Google Cloud and Amazon Web Services and AI chip startups Cerebras Systems, Mythic and Syntiant.



We've already seen a hybrid-cloud strategy with multiple data centers and public cloud providers emerge as the standard for large enterprises as the operational toolset continues to evolve and simplify cloud migrations. In 2022, we will see organizations grow their digital footprint by embracing the hybrid and multi-cloud model to enjoy elasticity and agility in the cloud, while maintaining tight control of the data they own. Cloud vendors will keep innovating and competing with differentiated capabilities in network connectivity and physical infrastructure improvements because organizations wouldn't want being locked-in. As the toolset for AI applications continues to evolve, machine learning and deep learning platforms have entered the mainstream and will attain the same level of maturity as specialized data analytics. Just like we currently see a plethora of fully integrated managed services based on Apache Spark and Presto, in 2022 we will see vertical integrations emerging based on the likes of PyTorch and Tensorflow.