Sivasubramaniam, Anand
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs
Jain, Rishabh, Bhasi, Vivek M., Jog, Adwait, Sivasubramaniam, Anand, Kandemir, Mahmut T., Das, Chita R.
Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalization needs (like ad serving or movie suggestions). With growing model and dataset sizes pushing computation and memory requirements, GPUs are being increasingly preferred for executing DLRM inference. However, serving newer DLRMs, while meeting acceptable latencies, continues to remain challenging, making traditional deployments increasingly more GPU-hungry, resulting in higher inference serving costs. In this paper, we show that the embedding stage continues to be the primary bottleneck in the GPU inference pipeline, leading up to a 3.2x embedding-only performance slowdown. To thoroughly grasp the problem, we conduct a detailed microarchitecture characterization and highlight the presence of low occupancy in the standard embedding kernels. By leveraging direct compiler optimizations, we achieve optimal occupancy, pushing the performance by up to 53%. Yet, long memory latency stalls continue to exist. To tackle this challenge, we propose specialized plug-and-play-based software prefetching and L2 pinning techniques, which help in hiding and decreasing the latencies. Further, we propose combining them, as they complement each other. Experimental evaluations using A100 GPUs with large models and datasets show that our proposed techniques improve performance by up to 103% for the embedding stage, and up to 77% for the overall DLRM inference pipeline.
Predicting Vehicular Travel Times by Modeling Heterogeneous Influences Between Arterial Roads
Achar, Avinash (Tata Consultancy Services) | Sarangan, Venkatesh (Tata Consultancy Services) | Regikumar, Rohith (Tata Consultancy Services) | Sivasubramaniam, Anand (Pennsylvania State University)
Predicting travel times of vehicles in urban settings is a useful and tangible quantity of interest in the context of intelligent transportation systems. We address the problem of travel time prediction in arterial roads using data sampled from probe vehicles. There is only a limited literature on methods using data input from probe vehicles. The spatio-temporal dependencies captured by existing data driven approaches are either too detailed or very simplistic. We strike a balance of the existing data driven approaches to account for varying degrees of influence a given road may experience from its neighbors, while controlling the number of parameters to be learnt. Specifically, we use a NoisyOR conditional probability distribution (CPD) in conjunction with a dynamic Bayesian network (DBN) to model state transitions of various roads. We propose an efficient algorithm to learn model parameters. We also propose an algorithm for predicting travel times on trips of arbitrary durations. Using synthetic and real world data traces we demonstrate the superior performance of the proposed method under different traffic conditions.
Cracks Under Pressure? Burst Prediction in Water Networks Using Dynamic Metrics
Kaushik, Gollakota (Tata Consultancy Services) | Manimaran, Abinaya (Tata Consultancy Services) | Vasan, Arunchandar (Tata Consultancy Services) | Sarangan, Venkatesh (Tata Consultancy Services) | Sivasubramaniam, Anand (Penn State University)
Ranking pipes according to their burst likelihood can help a water utility triage its proactive maintenance budget effectively. In the research literature, data-driven approaches have been used recently to predict pipe bursts. Such approaches make use of static features of the individual pipes such as diameter,length, and material to estimate burst likelihood for the next year by learning over past historical data. The burst likelihood of a pipe also depends on dynamic features such as its pressure and flow. Existing works ignore dynamic features because the features need to be measured or are difficult to obtain accurately using a well-calibrated hydraulic model. We complement prior data-driven approaches by proposing a methodology to approximately estimate the dynamic features of individual pipes from readily available network structure and other data. We study the error introduced by our approximation on an academic benchmark water network with ground truth. Using a real-world pipe burst dataset obtained from a European water utility for multiple years, we show that our approximate dynamic features improve the ability of machine learning classifiers to predict pipe bursts. The performance (as measured by the percentage of future bursts predicted) of the best forming classifier improves by nearly 50% through these dynamic features.