Ji, Sijie
One-shot Federated Learning Methods: A Practical Guide
Liu, Xiang, Tang, Zhenheng, Li, Xia, Song, Yijun, Ji, Sijie, Liu, Zemin, Han, Bo, Jiang, Linshan, Li, Jialin
One-shot Federated Learning (OFL) is a distributed machine learning paradigm that constrains client-server communication to a single round, addressing privacy and communication overhead issues associated with multiple rounds of data exchange in traditional Federated Learning (FL). OFL demonstrates the practical potential for integration with future approaches that require collaborative training models, such as large language models (LLMs). However, current OFL methods face two major challenges: data heterogeneity and model heterogeneity, which result in subpar performance compared to conventional FL methods. Worse still, despite numerous studies addressing these limitations, a comprehensive summary is still lacking. To address these gaps, this paper presents a systematic analysis of the challenges faced by OFL and thoroughly reviews the current methods. We also offer an innovative categorization method and analyze the trade-offs of various techniques. Additionally, we discuss the most promising future directions and the technologies that should be integrated into the OFL field. This work aims to provide guidance and insights for future research.
NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT
Zheng, Xinzhe, Ji, Sijie, Pan, Yipeng, Zhang, Kaiwen, Wu, Chenshu
Inertial tracking is vital for robotic IoT and has gained popularity thanks to the ubiquity of low-cost Inertial Measurement Units (IMUs) and deep learning-powered tracking algorithms. Existing works, however, have not fully utilized IMU measurements, particularly magnetometers, nor maximized the potential of deep learning to achieve the desired accuracy. To enhance the tracking accuracy for indoor robotic applications, we introduce NeurIT, a sequence-to-sequence framework that elevates tracking accuracy to a new level. NeurIT employs a Time-Frequency Block-recurrent Transformer (TF-BRT) at its core, combining the power of recurrent neural network (RNN) and Transformer to learn representative features in both time and frequency domains. To fully utilize IMU information, we strategically employ body-frame differentiation of the magnetometer, which considerably reduces the tracking error. NeurIT is implemented on a customized robotic platform and evaluated in various indoor environments. Experimental results demonstrate that NeurIT achieves a mere 1-meter tracking error over a 300-meter distance. Notably, it significantly outperforms state-of-the-art baselines by 48.21% on unseen data. NeurIT also performs comparably to the visual-inertial approach (Tango Phone) in vision-favored conditions and surpasses it in plain environments. We believe NeurIT takes an important step forward toward practical neural inertial tracking for ubiquitous and scalable tracking of robotic things. NeurIT, including the source code and the dataset, is open-sourced here: https://github.com/NeurIT-Project/NeurIT.
Are You Being Tracked? Discover the Power of Zero-Shot Trajectory Tracing with LLMs!
Yang, Huanqi, Ji, Sijie, Wu, Rucheng, Xu, Weitao
There is a burgeoning discussion around the capabilities of Large Language Models (LLMs) in acting as fundamental components that can be seamlessly incorporated into Artificial Intelligence of Things (AIoT) to interpret complex trajectories. This study introduces LLMTrack, a model that illustrates how LLMs can be leveraged for Zero-Shot Trajectory Recognition by employing a novel single-prompt technique that combines role-play and think step-by-step methodologies with unprocessed Inertial Measurement Unit (IMU) data. We evaluate the model using real-world datasets designed to challenge it with distinct trajectories characterized by indoor and outdoor scenarios. In both test scenarios, LLMTrack not only meets but exceeds the performance benchmarks set by traditional machine learning approaches and even contemporary state-of-the-art deep learning models, all without the requirement of training on specialized datasets. The results of our research suggest that, with strategically designed prompts, LLMs can tap into their extensive knowledge base and are well-equipped to analyze raw sensor data with remarkable effectiveness.
HARGPT: Are LLMs Zero-Shot Human Activity Recognizers?
Ji, Sijie, Zheng, Xinzhe, Wu, Chenshu
There is an ongoing debate regarding the potential of Large Language Models (LLMs) as foundational models seamlessly integrated with Cyber-Physical Systems (CPS) for interpreting the physical world. In this paper, we carry out a case study to answer the following question: Are LLMs capable of zero-shot human activity recognition (HAR). Our study, HARGPT, presents an affirmative answer by demonstrating that LLMs can comprehend raw IMU data and perform HAR tasks in a zero-shot manner, with only appropriate prompts. HARGPT inputs raw IMU data into LLMs and utilizes the role-play and think step-by-step strategies for prompting. We benchmark HARGPT on GPT4 using two public datasets of different inter-class similarities and compare various baselines both based on traditional machine learning and state-of-the-art deep classification models. Remarkably, LLMs successfully recognize human activities from raw IMU data and consistently outperform all the baselines on both datasets. Our findings indicate that by effective prompting, LLMs can interpret raw IMU data based on their knowledge base, possessing a promising potential to analyze raw sensor data of the physical world effectively.
Enhancing Deep Learning Performance of Massive MIMO CSI Feedback
Ji, Sijie, Li, Mo
Abstract--CSI feedback is an important problem of massive multiple-input multiple-output (MIMO) technology because the feedback overhead is proportional to the number of sub-channels and the number of antennas, both of which scale with the size of the massive MIMO system.Deep learning-based CSI feedback methods have been widely adopted recently owing to their superior performance. Despite the success, current approaches have not fully exploited the relationship between the characteristics of CSI data and the deep learning framework. Generally, DL-based methods utilize systems, e.g., 5G and above. Unlike traditional cellbased the auto-encoder framework [7], where the encoder learns to communication paradigms, the massive MIMO makes compress the original CSI at the UE side and the decoder better use of spatial diversity and serves users in a cellfree learns to reconstruct the original CSI at the BS side. A massive MIMO system typically is equipped is trained in unsupervised manner without the need with a large number of antennas at the base station (BS), for labeled data and only requires a single run upon deployment which aims to make full use of spatial diversity by conducting for continuous CSI reconstruction, which overcomes the beamforming to concentrate signal energy to a specific user computation inefficiency of traditional CS-based approaches.
CQNet: Complex Input Quantized Neural Network designed for Massive MIMO CSI Feedback
Ji, Sijie, Sun, Weiping, Li, Mo
The Massive Multiple Input Multiple Output (MIMO) system is a core technology of the next generation communication. With the growing complexity of CSI in massive MIMO system, traditional compressive sensing based CSI feedback has become a bottleneck problem that is limited in piratical. Recently, numerous deep learning based CSI feedback approaches demonstrate the efficiency and potential. However, the existing methods lack a reasonable interpretation of the deep learning model and the accuracy of the model decreases significantly as the CSI compression rate increases. In this paper, from the intrinsic properties of CSI data itself, we devised the corresponding deep learning building blocks to compose a novel neural network CQNet and experiment result shows CQNet outperform the state-of-the-art method with less computational overhead by achieving an average performance improvement of 8.07% in both outdoor and indoor scenarios. In addition, this paper also investigates the reasons for the decrease in model accuracy at large compression rates and proposes a strategy to embed a quantization layer to achieve effective compression, by which the original accuracy loss of 67.19% on average is reduced to 21.96% on average, and the compression rate is increased by 8 times on the original benchmark. The massive multiple-input multiple-output (MIMO) technology is considered one of the core technologies of the next generation communication system, e.g., 5G. By equipping large number of antennas, base station (BS) can sufficiently utilize spatial diversity to improve channel capacity.