Goto

Collaborating Authors

 Virtual Reality


Inside Anduril and Meta's quest to make smart glasses for warfare

MIT Technology Review

Inside Anduril and Meta's quest to make smart glasses for warfare It's been a year since the duo entered the US Army's troubled augmented-reality contest. Here's what it looks like so far. The defense-tech company Anduril has shared new details about the augmented-reality headset for the military it's prototyping with Meta, including a vision for ordering drone strikes via eye-tracking and voice commands. Quay Barnett, who leads the efforts as a vice president at Anduril following a career in the Army's Special Operations Command, says his fundamental goal is to optimize "the human as a weapons system." The vision is undoubtedly cyborg-inspired: Barnett wants drones and soldiers to see together, share information seamlessly, and make decisions as one. Anduril actually has two such projects in the works.


Is VR gaming now dead in the water?

PCWorld

PCWorld examines whether VR gaming is declining, highlighting challenges from Meta's failed Metaverse push and lack of compelling new content. Rising AI-driven hardware costs are making Valve's upcoming Steam Frame headset potentially unaffordable, while Apple's Vision Pro lacks gaming presence. Only Valve remains committed to VR gaming among major companies, making the technology's future uncertain despite continued development efforts. Meta is looking a lot less meta lately, reportedly pivoting from the virtual reality Quest brand and the ghost of Oculus to double down on pervert glasses. After a decade of work, Sony's VR ambitions over on the PlayStation seem to have made little progress. And I've barely heard a mention of Samsung's Galaxy XR headset--allegedly the flagship launch device for Android XR--since it arrived six months ago. While the idea that Apple is abandoning its Vision Pro headset might be overblown--the company is still actively hiring for the division--Michael Simon over at Macworld tells me the platform has basically zero gaming presence for the hardware. Hope for renewed interest in VR gaming with a big injection of Cupertino branding power has evaporated. Is virtual reality gaming, to borrow a term from, cooked?


Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment

Neural Information Processing Systems

Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs) without relying on pristine-quality image information. It is becoming more significant with the increasing advancement of virtual reality (VR) technology. However, the quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process. To tackle this issue, we propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure. Specifically, we propose a generalized Recursive Probability Sampling (RPS) method for the BOIQA task, combining content and details information to generate multiple pseudo viewport sequences from a given starting point.


PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas

Neural Information Processing Systems

Achieving an immersive experience enabling users to explore virtual environments with six degrees of freedom (6DoF) is essential for various applications such as virtual reality (VR). Wide-baseline panoramas are commonly used in these applications to reduce network bandwidth and storage requirements. However, synthesizing novel views from these panoramas remains a key challenge. Although existing neural radiance field methods can produce photorealistic views under narrow-baseline and dense image captures, they tend to overfit the training views when dealing with wide-baseline panoramas due to the difficulty in learning accurate geometry from sparse 360 views. To address this problem, we propose PanoGRF, Generalizable Spherical Radiance Fields for Wide-baseline Panoramas, which construct spherical radiance fields incorporating 360 scene priors. Unlike generalizable radiance fields trained on perspective images, PanoGRF avoids the information loss from panorama-to-perspective conversion and directly aggregates geometry and appearance features of 3D sample points from each panoramic view based on spherical projection. Moreover, as some regions of the panorama are only visible from one view while invisible from others under wide baseline settings, PanoGRF incorporates 360 monocular depth priors into spherical depth estimation to improve the geometry features. Experimental results on multiple panoramic datasets demonstrate that PanoGRF significantly outperforms state-of-the-art generalizable view synthesis methods for wide-baseline panoramas (e.g., OmniSyn) and perspective images (e.g., IBRNet, NeuRay).


Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions

Neural Information Processing Systems

Understanding how humans interact with each other is key to building realistic multi-human virtual reality systems. This area remains relatively unexplored due to the lack of large-scale datasets. Recent datasets focusing on this issue mainly consist of activities captured entirely in controlled indoor environments with choreographed actions, significantly affecting their diversity. To address this, we introduce Harmony4D, a multi-view video dataset for human-human interaction featuring in-the-wild activities such as wrestling, dancing, MMA,and more. We use a flexible multi-view capture system to record these dynamic activities and provide annotations for human detection, tracking, 2D/3D pose estimation, and mesh recovery for closely interacting subjects. We propose a novel markerless algorithm to track 3D human poses in severe occlusion and close interaction to obtain our annotations with minimal manual intervention. Harmony4D consists of 1.66 million images and 3.32 million human instances from more than 20 synchronized cameras with 208 video sequences spanning diverse environments and 24 unique subjects. We rigorously evaluate existing state-of-the-art methods for mesh recovery and highlight their significant limitations in modeling close interaction scenarios. Additionally, we fine-tune a pre-trained HMR2.0 model on Harmony4D and demonstrate an improved performance of 54.8% PVE in scenes with severe occlusion and contact.


Learning Disentangled Representations for Perceptual Point Cloud Quality Assessment via Mutual Information Minimization

Neural Information Processing Systems

No-Reference Point Cloud Quality Assessment (NR-PCQA) aims to objectively assess the human perceptual quality of point clouds without relying on pristine-quality point clouds for reference. It is becoming increasingly significant with the rapid advancement of immersive media applications such as virtual reality (VR) and augmented reality (AR). However, current NR-PCQA models attempt to indiscriminately learn point cloud content and distortion representations within a single network, overlooking their distinct contributions to quality information. To address this issue, we propose DisPA, a novel disentangled representation learning framework for NR-PCQA. The framework trains a dual-branch disentanglement network to minimize mutual information (MI) between representations of point cloud content and distortion. Specifically, to fully disentangle representations, the two branches adopt different philosophies: the content-aware encoder is pretrained by a masked auto-encoding strategy, which can allow the encoder to capture semantic information from rendered images of distorted point clouds; the distortion-aware encoder takes a mini-patch map as input, which forces the encoder to focus on low-level distortion patterns. Furthermore, we utilize an MI estimator to estimate the tight upper bound of the actual MI and further minimize it to achieve explicit representation disentanglement. Extensive experimental results demonstrate that DisPA outperforms state-of-the-art methods on multiple PCQA datasets.


EEVR: A Dataset of Paired Physiological Signals and Textual Descriptions for Joint Emotion Representation Learning

Neural Information Processing Systems

EEVR (Emotion Elicitation in Virtual Reality) is a novel dataset specifically designed for language supervision-based pre-training of emotion recognition tasks, such as valence and arousal classification. It features high-quality physiological signals, including electrodermal activity (EDA) and photoplethysmography (PPG), acquired through emotion elicitation via 360-degree virtual reality (VR) videos.Additionally, it includes subject-wise textual descriptions of emotions experienced during each stimulus gathered from qualitative interviews. The dataset consists of recordings from 37 participants and is the first dataset to pair raw text with physiological signals, providing additional contextual information that objective labels cannot offer. To leverage this dataset, we introduced the Contrastive Language Signal Pre-training (CLSP) method, which jointly learns representations using pairs of physiological signals and textual descriptions. Our results show that integrating self-reported textual descriptions with physiological signals significantly improves performance on emotion recognition tasks, such as arousal and valence classification. Moreover, our pre-trained CLSP model demonstrates strong zero-shot transferability to existing datasets, outperforming supervised baseline models, suggesting that the representations learned by our method are more contextualized and generalized. The dataset also includes baseline models for arousal, valence, and emotion classification, as well as code for data cleaning and feature extraction.


Meta Is Shutting Down Horizon Worlds on Meta Quest

WIRED

Meta's flailing virtual reality social experience is being discontinued in June. It's part of Meta's broader moves to slim down the business that became its namesake. Pour one out from your digital bottle, because Meta is shutting down the virtual reality experience of Horizon Worlds. Meta sent an email blast to Horizon Worlds users today stating that the social VR world will officially end on its Quest VR headsets; starting March 31, Horizon Worlds will no longer be in the Quest store. Some Horizon-specific perks, including Meta Credits, avatars, and some digital clothes and in-world purchases, will also be removed.


Category

Neural Information Processing Systems

Estimating the 6D object pose is one of the core problems in computer vision and robotics. It predicts the full configurations of rotation, translation and size of a given object, which has wide applications including Virtual Reality (VR) [2], scene understanding [30], and [42, 57, 31, 49]. There are twodirections in 6D object pose estimation.