Goto

Collaborating Authors

 vr user


RAG-VR: Leveraging Retrieval-Augmented Generation for 3D Question Answering in VR Environments

Ding, Shiyi, Chen, Ying

arXiv.org Artificial Intelligence

Figure 1: The hardware setup of our RAG-VR system including a VR device (left) and an edge server and a router (center), as well as an example of the user interface displayed to a VR user (right). Recent advances in large language models (LLMs) provide new opportunities for context understanding in virtual reality (VR). However, VR contexts are often highly localized and personalized, limiting the effectiveness of general-purpose LLMs. To address this challenge, we present RAG-VR, the first 3D question-answering system for VR that incorporates retrieval-augmented generation (RAG), which augments an LLM with external knowledge retrieved from a localized knowledge database to improve the answer quality. RAG-VR includes a pipeline for extracting comprehensive knowledge about virtual environments and user conditions for accurate answer generation. To ensure efficient retrieval, RAG-VR offloads the retrieval process to a nearby edge server and uses only essential information during retrieval. Moreover, we train the retriever to effectively distinguish among relevant, irrelevant, and hard-to-differentiate information in relation to questions. RAG-VR improves answer accuracy by 17.9%-41.8% As virtual reality (VR) continues to transform various facets of life, such as entertainment, social interactions, commerce, and education, there is a growing demand for VR applications endowed with context understanding capabilities [22, 19]. By gaining a detailed knowledge of virtual environments and VR users, these applications deliver immersive and personalized experiences, intelligently responding to user queries regarding their own conditions and surrounding 3D virtual objects.


Personalized Federated Learning for Cellular VR: Online Learning and Dynamic Caching

Tharakan, Krishnendu S., Dahrouj, Hayssam, Kouzayha, Nour, ElSawy, Hesham, Al-Naffouri, Tareq Y.

arXiv.org Artificial Intelligence

Delivering an immersive experience to virtual reality (VR) users through wireless connectivity offers the freedom to engage from anywhere at any time. Nevertheless, it is challenging to ensure seamless wireless connectivity that delivers real-time and high-quality videos to the VR users. This paper proposes a field of view (FoV) aware caching for mobile edge computing (MEC)-enabled wireless VR network. In particular, the FoV of each VR user is cached/prefetched at the base stations (BSs) based on the caching strategies tailored to each BS. Specifically, decentralized and personalized federated learning (DP-FL) based caching strategies with guarantees are presented. Considering VR systems composed of multiple VR devices and BSs, a DP-FL caching algorithm is implemented at each BS to personalize content delivery for VR users. The utilized DP-FL algorithm guarantees a probably approximately correct (PAC) bound on the conditional average cache hit. Further, to reduce the cost of communicating gradients, one-bit quantization of the stochastic gradient descent (OBSGD) is proposed, and a convergence guarantee of $\mathcal{O}(1/\sqrt{T})$ is obtained for the proposed algorithm, where $T$ is the number of iterations. Additionally, to better account for the wireless channel dynamics, the FoVs are grouped into multicast or unicast groups based on the number of requesting VR users. The performance of the proposed DP-FL algorithm is validated through realistic VR head-tracking dataset, and the proposed algorithm is shown to have better performance in terms of average delay and cache hit as compared to baseline algorithms.


Federated Multi-View Synthesizing for Metaverse

Guo, Yiyu, Qin, Zhijin, Tao, Xiaoming, Li, Geoffrey Ye

arXiv.org Artificial Intelligence

The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view synthesizing framework that can efficiently provide computation, storage, and communication resources for wireless content delivery in the metaverse. We propose a three-dimensional (3D)-aware generative model that uses collections of single-view images. These single-view images are transmitted to a group of users with overlapping fields of view, which avoids massive content transmission compared to transmitting tiles or whole 3D models. We then present a federated learning approach to guarantee an efficient learning process. The training performance can be improved by characterizing the vertical and horizontal data samples with a large latent feature space, while low-latency communication can be achieved with a reduced number of transmitted parameters during federated learning. We also propose a federated transfer learning framework to enable fast domain adaptation to different target domains. Simulation results have demonstrated the effectiveness of our proposed federated multi-view synthesizing framework for VR content delivery.


AI can determine personal information through AR, VR users' motion data, studies say

FOX News

Fox News' Eben Brown reports on how more companies are using A.I. technology to set retail prices based on data-driven supply-and-demand. People who participate in augmented and virtual realities are reportedly sharing more information than previously understood through motion data, according to researchers at U.C. Berkeley. In two studies published earlier this year, led by the university, authors found users can be identified using just minutes of their head and hand movements. Such data, which is collected, can be used to infer dozens of related characteristics, like age and disability status. "Users are revealing way more information than they think. And there's very little that they can do to prevent that," Vivek Nair, the studies' lead author and a Ph.D. student at Berkeley's Department of Electrical Engineering and Computer Sciences, said in a release.


Heterogeneous 360 Degree Videos in Metaverse: Differentiated Reinforcement Learning Approaches

Yu, Wenhan, Zhao, Jun

arXiv.org Artificial Intelligence

Advanced video technologies are driving the development of the futuristic Metaverse, which aims to connect users from anywhere and anytime. As such, the use cases for users will be much more diverse, leading to a mix of 360-degree videos with two types: non-VR and VR 360-degree videos. This paper presents a novel Quality of Service model for heterogeneous 360-degree videos with different requirements for frame rates and cybersickness. We propose a frame-slotted structure and conduct frame-wise optimization using self-designed differentiated deep reinforcement learning algorithms. Specifically, we design two structures, Separate Input Differentiated Output (SIDO) and Merged Input Differentiated Output (MIDO), for this heterogeneous scenario. We also conduct comprehensive experiments to demonstrate their effectiveness.


WiserVR: Semantic Communication Enabled Wireless Virtual Reality Delivery

Xia, Le, Sun, Yao, Liang, Chengsi, Feng, Daquan, Cheng, Runze, Yang, Yang, Imran, Muhammad Ali

arXiv.org Artificial Intelligence

Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using semantic communication, a new paradigm that promises to significantly ease the resource pressure, for efficient VR delivery. To this end, we propose a novel framework, namely WIreless SEmantic deliveRy for VR (WiserVR), for delivering consecutive 360{\deg} video frames to VR users. Specifically, deep learning-based multiple modules are well-devised for the transceiver in WiserVR to realize high-performance feature extraction and semantic recovery. Among them, we dedicatedly develop a concept of semantic location graph and leverage the joint-semantic-channel-coding method with knowledge sharing to not only substantially reduce communication latency, but also to guarantee adequate transmission reliability and resilience under various channel states. Moreover, implementation of WiserVR is presented, followed by corresponding initial simulations for performance evaluation compared with benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of WiserVR.


Short-Term Trajectory Prediction for Full-Immersive Multiuser Virtual Reality with Redirected Walking

Lemic, Filip, Struye, Jakob, Famaey, Jeroen

arXiv.org Artificial Intelligence

Full-immersive multiuser Virtual Reality (VR) envisions supporting unconstrained mobility of the users in the virtual worlds, while at the same time constraining their physical movements inside VR setups through redirected walking. For enabling delivery of high data rate video content in real-time, the supporting wireless networks will leverage highly directional communication links that will "track" the users for maintaining the Line-of-Sight (LoS) connectivity. Recurrent Neural Networks (RNNs) and in particular Long Short-Term Memory (LSTM) networks have historically presented themselves as a suitable candidate for near-term movement trajectory prediction for natural human mobility, and have also recently been shown as applicable in predicting VR users' mobility under the constraints of redirected walking. In this work, we extend these initial findings by showing that Gated Recurrent Unit (GRU) networks, another candidate from the RNN family, generally outperform the traditionally utilized LSTMs. Second, we show that context from a virtual world can enhance the accuracy of the prediction if used as an additional input feature in comparison to the more traditional utilization of solely the historical physical movements of the VR users. Finally, we show that the prediction system trained on a static number of coexisting VR users be scaled to a multi-user system without significant accuracy degradation.


Deep Learning for Content-based Personalized Viewport Prediction of 360-Degree VR Videos

Chen, Xinwei, Kasgari, Ali Taleb Zadeh, Saad, Walid

arXiv.org Machine Learning

In this paper, the problem of head movement prediction for virtual reality videos is studied. In the considered model, a deep learning network is introduced to leverage position data as well as video frame content to predict future head movement. For optimizing data input into this neural network, data sample rate, reduced data, and long-period prediction length are also explored for this model. Simulation results show that the proposed approach yields 16.1\% improvement in terms of prediction accuracy compared to a baseline approach that relies only on the position data.