Goto

Collaborating Authors

 augmented reality


Spatiotemporal Calibration and Ground Truth Estimation for High-Precision SLAM Benchmarking in Extended Reality

Shu, Zichao, Bei, Shitao, Li, Lijun, Chen, Zetao

arXiv.org Artificial Intelligence

Simultaneous localization and mapping (SLAM) plays a fundamental role in extended reality (XR) applications. As the standards for immersion in XR continue to increase, the demands for SLAM benchmarking have become more stringent. Trajectory accuracy is the key metric, and marker-based optical motion capture (MoCap) systems are widely used to generate ground truth (GT) because of their drift-free and relatively accurate measurements. However, the precision of MoCap-based GT is limited by two factors: the spatiotemporal calibration with the device under test (DUT) and the inherent jitter in the MoCap measurements. These limitations hinder accurate SLAM benchmarking, particularly for key metrics like rotation error and inter-frame jitter, which are critical for immersive XR experiences. This paper presents a novel continuous-time maximum likelihood estimator to address these challenges. The proposed method integrates auxiliary inertial measurement unit (IMU) data to compensate for MoCap jitter. Additionally, a variable time synchronization method and a pose residual based on screw congruence constraints are proposed, enabling precise spatiotemporal calibration across multiple sensors and the DUT. Experimental results demonstrate that our approach outperforms existing methods, achieving the precision necessary for comprehensive benchmarking of state-of-the-art SLAM algorithms in XR applications. Furthermore, we thoroughly validate the practicality of our method by benchmarking several leading XR devices and open-source SLAM algorithms. The code is publicly available at https://github.com/ylab-xrpg/xr-hpgt.


A Virtual Mechanical Interaction Layer Enables Resilient Human-to-Robot Object Handovers

Faris, Omar, Tadeja, Sławomir, Forni, Fulvio

arXiv.org Artificial Intelligence

Abstract-- Object handover is a common form of interaction that is widely present in collaborative tasks. However, achieving it efficiently remains a challenge. We address the problem of ensuring resilient robotic actions that can adapt to complex changes in object pose during human-to-robot object handovers. We propose the use of Virtual Model Control to create an interaction layer that controls the robot and adapts to the dynamic changes in the handover process. Additionally, we propose the use of augmented reality to facilitate bidirectional communication between humans and robots during handovers. We assess the performance of our controller in a set of experiments that demonstrate its resilience to various sources of uncertainties, including complex changes to the object's pose during the handover . Finally, we performed a user study with 16 participants to understand human preferences for different robot control profiles and augmented reality visuals in object handovers. Our results showed a general preference for the proposed approach and revealed insights that can guide further development in adapting the interaction with the user . Human-to-robot object handover is a fundamental task that frequently occurs in collaborative manipulation.


Generative Augmented Reality: Paradigms, Technologies, and Future Applications

Liang, Chen, Zheng, Jiawen, Zeng, Yufeng, Tan, Yi, Lyu, Hengye, Zheng, Yuhui, Li, Zisu, Weng, Yueting, Shi, Jiaxin, Zhang, Hanwang

arXiv.org Artificial Intelligence

This paper introduces Generative Augmented Reality (GAR) as a next-generation paradigm that reframes augmentation as a process of world re-synthesis rather than world composition by a conventional AR engine. GAR replaces the conventional AR engine's multi-stage modules with a unified generative backbone, where environmental sensing, virtual content, and interaction signals are jointly encoded as conditioning inputs for continuous video generation. We formalize the computational correspondence between AR and GAR, survey the technical foundations that make real-time generative augmentation feasible, and outline prospective applications that leverage its unified inference model. We envision GAR as a future AR paradigm that delivers high-fidelity experiences in terms of realism, interactivity, and immersion, while eliciting new research challenges on technologies, content ecosystems, and the ethical and societal implications.


AI Assisted AR Assembly: Object Recognition and Computer Vision for Augmented Reality Assisted Assembly

Kyaw, Alexander Htet, Ma, Haotian, Zivkovic, Sasa, Sabin, Jenny

arXiv.org Artificial Intelligence

We present an AI-assisted Augmented Reality assembly workflow that uses deep learning-based object recognition to identify different assembly components and display step-by-step instructions. For each assembly step, the system displays a bounding box around the corresponding components in the physical space, and where the component should be placed. By connecting assembly instructions with the real-time location of relevant components, the system eliminates the need for manual searching, sorting, or labeling of different components before each assembly. To demonstrate the feasibility of using object recognition for AR-assisted assembly, we highlight a case study involving the assembly of LEGO sculptures.


EEG-Driven AR-Robot System for Zero-Touch Grasping Manipulation

Wang, Junzhe, Xie, Jiarui, Hao, Pengfei, Li, Zheng, Cai, Yi

arXiv.org Artificial Intelligence

Reliable brain-computer interface (BCI) control of robots provides an intuitive and accessible means of human-robot interaction, particularly valuable for individuals with motor impairments. However, existing BCI-Robot systems face major limitations: electroencephalography (EEG) signals are noisy and unstable, target selection is often predefined and inflexible, and most studies remain restricted to simulation without closed-loop validation. These issues hinder real-world deployment in assistive scenarios. To address them, we propose a closed-loop BCI-AR-Robot system that integrates motor imagery (MI)-based EEG decoding, augmented reality (AR) neurofeedback, and robotic grasping for zero-touch operation. A 14-channel EEG headset enabled individualized MI calibration, a smartphone-based AR interface supported multi-target navigation with direction-congruent feedback to enhance stability, and the robotic arm combined decision outputs with vision-based pose estimation for autonomous grasping. Experiments are conducted to validate the framework: MI training achieved 93.1 percent accuracy with an average information transfer rate (ITR) of 14.8 bit/min; AR neurofeedback significantly improved sustained control (SCI = 0.210) and achieved the highest ITR (21.3 bit/min) compared with static, sham, and no-AR baselines; and closed-loop grasping achieved a 97.2 percent success rate with good efficiency and strong user-reported control. These results show that AR feedback substantially stabilizes EEG-based control and that the proposed framework enables robust zero-touch grasping, advancing assistive robotic applications and future modes of human-robot interaction.


This program is using augmented reality to teach preschoolers spatial awareness

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. A child uses a tablet to play an augmented reality game meant to teach spatial awareness. This is read by an automated voice. Please report any issues or inconsistencies here . Spatial thinking concepts are a part of early math that have largely been absent from preschool curricula.


An Embodied AR Navigation Agent: Integrating BIM with Retrieval-Augmented Generation for Language Guidance

Yang, Hsuan-Kung, Hsiao, Tsu-Ching, Oka, Ryoichiro, Nishino, Ryuya, Tofukuji, Satoko, Kobori, Norimasa

arXiv.org Artificial Intelligence

Delivering intelligent and adaptive navigation assistance in augmented reality (AR) requires more than visual cues, as it demands systems capable of interpreting flexible user intent and reasoning over both spatial and semantic context. Prior AR navigation systems often rely on rigid input schemes or predefined commands, which limit the utility of rich building data and hinder natural interaction. In this work, we propose an embodied AR navigation system that integrates Building Information Modeling (BIM) with a multi-agent retrieval-augmented generation (RAG) framework to support flexible, language-driven goal retrieval and route planning. The system orchestrates three language agents, Triage, Search, and Response, built on large language models (LLMs), which enables robust interpretation of open-ended queries and spatial reasoning using BIM data. Navigation guidance is delivered through an embodied AR agent, equipped with voice interaction and locomotion, to enhance user experience. A real-world user study yields a System Usability Scale (SUS) score of 80.5, indicating excellent usability, and comparative evaluations show that the embodied interface can significantly improves users' perception of system intelligence. These results underscore the importance and potential of language-grounded reasoning and embodiment in the design of user-centered AR navigation systems.


GhostObjects: Instructing Robots by Manipulating Spatially Aligned Virtual Twins in Augmented Reality

Wang, Lauren W., Abtahi, Parastoo

arXiv.org Artificial Intelligence

Robots are increasingly capable of autonomous operations, yet human interaction remains essential for issuing personalized instructions. Instead of directly controlling robots through Programming by Demonstration (PbD) or teleoperation, we propose giving instructions by interacting with GhostObjects-world-aligned, life-size virtual twins of physical objects-in augmented reality (AR). By direct manipulation of GhostObjects, users can precisely specify physical goals and spatial parameters, with features including real-world lasso selection of multiple objects and snapping back to default positions, enabling tasks beyond simple pick-and-place.


Designing Memory-Augmented AR Agents for Spatiotemporal Reasoning in Personalized Task Assistance

Choi, Dongwook, Kwon, Taeyoon, Yang, Dongil, Kim, Hyojun, Yeo, Jinyoung

arXiv.org Artificial Intelligence

Augmented Reality (AR) systems are increasingly integrating foundation models, such as Multimodal Large Language Models (MLLMs), to provide more context-aware and adaptive user experiences. This integration has led to the development of AR agents to support intelligent, goal-directed interactions in real-world environments. While current AR agents effectively support immediate tasks, they struggle with complex multi-step scenarios that require understanding and leveraging user's long-term experiences and preferences. This limitation stems from their inability to capture, retain, and reason over historical user interactions in spatiotemporal contexts. To address these challenges, we propose a conceptual framework for memory-augmented AR agents that can provide personalized task assistance by learning from and adapting to user-specific experiences over time. Our framework consists of four interconnected modules: (1) Perception Module for multimodal sensor processing, (2) Memory Module for persistent spatiotemporal experience storage, (3) Spatiotemporal Reasoning Module for synthesizing past and present contexts, and (4) Actuator Module for effective AR communication. We further present an implementation roadmap, a future evaluation strategy, a potential target application and use cases to demonstrate the practical applicability of our framework across diverse domains. We aim for this work to motivate future research toward developing more intelligent AR systems that can effectively bridge user's interaction history with adaptive, context-aware task assistance.


Agency, Affordances, and Enculturation of Augmentation Technologies

Duin, Ann Hill, Pedersen, Isabel

arXiv.org Artificial Intelligence

Augmentation technologies are undergoing a process of enculturation due to many factors, one being the rise of artificial intelligence (AI), or what the World Intellectual Property Organization (WIPO) terms the AI wave or AI boom. Chapter 3 focuses critical attention on the hyped assumption that sophisticated, emergent, and embodied augmentation technologies will improve lives, literacy, cultures, arts, economies, and social contexts. The chapter begins by discussing the problem of ambiguity with AI terminology, which it aids with a description of the WIPO Categorization of AI Technologies Scheme. It then draws on media and communication studies to explore concepts such as agents, agency, power, and agentive relationships between humans and robots. The chapter focuses on the development of non-human agents in industry as a critical factor in the rise of augmentation technologies. It looks at how marketing communication enculturates future users to adopt and adapt to the technology. Scholars are charting the significant ways that people are drawn further into commercial digital landscapes, such as the Metaverse concept, in post-internet society. It concludes by examining recent claims concerning the Metaverse and augmented reality.