Major, Bence
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Federici, Marco, Belli, Davide, van Baalen, Mart, Jalalirad, Amir, Skliar, Andrii, Major, Bence, Nagel, Markus, Whatmough, Paul
While mobile devices provide ever more compute power, improvements in DRAM bandwidth are much slower. This is unfortunate for large language model (LLM) token generation, which is heavily memory-bound. Previous work has proposed to leverage natural dynamic activation sparsity in ReLU-activated LLMs to reduce effective DRAM bandwidth per token. However, more recent LLMs use SwiGLU instead of ReLU, which result in little inherent sparsity. While SwiGLU activations can be pruned based on magnitude, the resulting sparsity patterns are difficult to predict, rendering previous approaches ineffective. To circumvent this issue, our work introduces Dynamic Input Pruning (DIP): a predictor-free dynamic sparsification approach, which preserves accuracy with minimal fine-tuning. DIP can further use lightweight LoRA adapters to regain some performance lost during sparsification. Lastly, we describe a novel cache-aware masking strategy, which considers the cache state and activation magnitude to further increase cache hit rate, improving LLM token rate on mobile devices. DIP outperforms other methods in terms of accuracy, memory and throughput trade-offs across simulated hardware settings. On Phi-3-Medium, DIP achieves a 46% reduction in memory and 40% increase in throughput with $<$ 0.1 loss in perplexity.
GNSS Positioning using Cost Function Regulated Multilateration and Graph Neural Networks
Jalalirad, Amir, Belli, Davide, Major, Bence, Jee, Songwon, Shah, Himanshu, Morrison, Will
He obtained his Ph.D. in Electrical Engineering from Eindhoven University of Technology in 2016. His research interests include applications of deep learning in positioning, navigation and RF signal processing systems. Davide Belli received his M.S. degree in Artificial Intelligence from the University of Amsterdam in 2019. He is currently a Senior Machine Learning Researcher at Qualcomm AI Research. His research interests include deep learning for the visual and RF domain, model personalization, and graph representation learning. Bence Major is a Staff Engineer at Qualcomm AI Research, leading a research team in the use of artificial intelligence for RF sensing and positioning. His research work focuses on non-visual sensory data, such as radar, ultrasound, and wireless signals. He received his M.S. degree in Computer Science from the Budapest University of Technology and Economics. Songwon Jee received his M.S. degree in Electrical Engineering from Stanford University in 2016. He is currently a Senior Staff Engineer in Location Technology Team at Qualcomm Technology Inc. His research interests include the application of deep learning for location technology involving GNSS, sensors, and wireless technologies. Himanshu Shah received his M.S. and Ph.D. degrees in Electrical Engineering from Arizona State University in 2004 and 2009 respectively.
Neural 5G Indoor Localization with IMU Supervision
Ermolov, Aleksandr, Kadambi, Shreya, Arnold, Maximilian, Hirzallah, Mohammed, Amiri, Roohollah, Singh, Deepak Singh Mahendar, Yerramalli, Srinivas, Dijkman, Daniel, Porikli, Fatih, Yoo, Taesang, Major, Bence
Radio signals are well suited for user localization because they are ubiquitous, can operate in the dark and maintain privacy. Many prior works learn mappings between channel state information (CSI) and position fully-supervised. However, that approach relies on position labels which are very expensive to acquire. In this work, this requirement is relaxed by using pseudo-labels during deployment, which are calculated from an inertial measurement unit (IMU). We propose practical algorithms for IMU double integration and training of the localization system. We show decimeter-level accuracy on simulated and challenging real data of 5G measurements. Our IMU-supervised method performs similarly to fully-supervised, but requires much less effort to deploy.
Vision-Assisted Digital Twin Creation for mmWave Beam Management
Arnold, Maximilian, Major, Bence, Massoli, Fabio Valerio, Soriaga, Joseph B., Behboodi, Arash
In the context of communication networks, digital twin technology provides a means to replicate the radio frequency (RF) propagation environment as well as the system behaviour, allowing for a way to optimize the performance of a deployed system based on simulations. One of the key challenges in the application of Digital Twin technology to mmWave systems is the prevalent channel simulators' stringent requirements on the accuracy of the 3D Digital Twin, reducing the feasibility of the technology in real applications. We propose a practical Digital Twin creation pipeline and a channel simulator, that relies only on a single mounted camera and position information. We demonstrate the performance benefits compared to methods that do not explicitly model the 3D environment, on downstream sub-tasks in beam acquisition, using the real-world dataset of the DeepSense6G challenge