LoXR: Performance Evaluation of Locally Executing LLMs on XR Devices
Khan, Dawar, Liu, Xinyu, Mena, Omar, Jia, Donggang, Kouyoumdjian, Alexandre, Viola, Ivan
–arXiv.org Artificial Intelligence
Abstract--The deployment of large language models (LLMs) on extended reality (XR) devices has great potential to advance the field of human-AI interaction. In case of direct, on-device model inference, selecting the appropriate model and device for specific tasks remains challenging. In this paper, we deploy 17 LLMs across four XR devices--Magic Leap 2, Meta Quest 3, Vivo X100s Pro, and Apple Vision Pro--and conduct a comprehensive evaluation. We devise an experimental setup and evaluate performance on four key metrics: performance consistency, processing speed, memory usage, and battery consumption. For each of the 68 model-device pairs, we assess performance under varying string lengths, batch sizes, and thread counts, analyzing the tradeoffs for real-time XR applications. We finally propose a unified evaluation method based on the Pareto Optimality theory to select the optimal device-model pairs from the quality and speed objectives. We believe our findings offer valuable insight to guide future optimization efforts for LLM deployment on XR devices. Our evaluation method can be followed as standard groundwork for further research and development in this emerging field. All supplemental materials are available at nanovis.org/Loxr.html. These models are capable of describing a wide variety of topics, respond at various levels of abstraction, and communicate effectively in multiple languages. They have proven capable of providing users with accurate and contextually appropriate responses. LLMs have quickly found applications in tasks such as spelling and grammar correction [2], generating text on specified topics [3], integration into automated chatbot services, and even generating source code from loosely defined software specifications [4]. Research on language models, and on their multimodal variants integrating language and vision or other technologies has recently experienced rapid growth. For instance, in computer vision, language models are combined with visual signals to achieve tasks such as verbal scene description and even open-world scenegraph generation [5]. These technologies enable detailed interpretation of everyday objects, inference of relationships among them, and estimates of physical properties like size, weight, distance, and speed. In user interaction and visualization research, LLMs serve as verbal interfaces to control software functionality or adjust visualization parameters [6], [7]. Through prompt engineering or fine-tuning, loosely defined text can be translated into specific commands that execute desired actions within a system, supported by language model APIs. The capabilities of language models continue to improve significantly from one version to the next. Xinyu Liu is with King Abdullah University of Science and T echnology (KAUST), Saudi Arabia, and also with University of Electronic Science and T echnology of China, Chengdu, China.
arXiv.org Artificial Intelligence
Feb-13-2025
- Country:
- Asia
- China > Sichuan Province
- Chengdu (0.24)
- Middle East > Saudi Arabia (0.24)
- China > Sichuan Province
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education (1.00)
- Energy (0.67)
- Information Technology (1.00)
- Technology: