ar system
Learn to Optimize Resource Allocation under QoS Constraint of AR
Chen, Shiyong, Dai, Yuwei, Han, Shengqian
This paper studies the uplink and downlink power allocation for interactive augmented reality (AR) services, where live video captured by an AR device is uploaded to the network edge and then the augmented video is subsequently downloaded. By modeling the AR transmission process as a tandem queuing system, we derive an upper bound for the probabilistic quality of service (QoS) requirement concerning end-to-end latency and reliability. The resource allocation with the QoS constraints results in a functional optimization problem. To address it, we design a deep neural network to learn the power allocation policy, leveraging the structure of optimal power allocation to enhance learning performance. Simulation results demonstrate that the proposed method effectively reduces transmit powers while meeting the QoS requirement.
#RoboCup2024 โ daily digest: 21 July
A break in play during a Small Size League match. Today, 21 July, saw the competitions draw to a close in a thrilling finale. In the third and final of our round-up articles, we provide a flavour of the action from this last day. If you missed them, you can find our first two digests here: 19 July 20 July. My first port of call this morning was the Standard Platform League, where Dr Timothy Wiley and Tom Ellis from Team RedbackBots, RMIT University, Melbourne, Australia, demonstrated an exciting advancement that is unique to their team.
Towards Transcervical Ultrasound Image Guidance for Transoral Robotic Surgery
Chen, Wanwen, Kalia, Megha, Zeng, Qi, Pang, Emily H. T., Bagherinasab, Razeyeh, Milner, Thomas D., Sabiq, Farahna, Prisman, Eitan, Salcudean, Septimiu E.
Purpose: Trans-oral robotic surgery (TORS) using the da Vinci surgical robot is a new minimally-invasive surgery method to treat oropharyngeal tumors, but it is a challenging operation. Augmented reality (AR) based on intra-operative ultrasound (US) has the potential to enhance the visualization of the anatomy and cancerous tumors to provide additional tools for decision-making in surgery. Methods: We propose and carry out preliminary evaluations of a US-guided AR system for TORS, with the transducer placed on the neck for a transcervical view. Firstly, we perform a novel MRI-transcervical 3D US registration study. Secondly, we develop a US-robot calibration method with an optical tracker and an AR system to display the anatomy mesh model in the real-time endoscope images inside the surgeon console. Results: Our AR system reaches a mean projection error of 26.81 and 27.85 pixels for the projection from the US to stereo cameras in a water bath experiment. The average target registration error for MRI to 3D US is 8.90 mm for the 3D US transducer and 5.85 mm for freehand 3D US, and the average distance between the vessel centerlines is 2.32 mm. Conclusion: We demonstrate the first proof-of-concept transcervical US-guided AR system for TORS and the feasibility of trans-cervical 3D US-MRI registration. Our results show that trans-cervical 3D US is a promising technique for TORS image guidance.
XAIR: A Framework of Explainable AI in Augmented Reality
Xu, Xuhai, Yu, Mengjie, Jonker, Tanya R., Todi, Kashyap, Lu, Feiyu, Qian, Xun, Belo, Joรฃo Marcelo Evangelista, Wang, Tianyi, Li, Michelle, Mun, Aran, Wu, Te-Yen, Shen, Junxiao, Zhang, Ting, Kokhlikyan, Narine, Wang, Fulton, Sorenson, Paul, Kim, Sophie Kahyun, Benko, Hrvoje
Explainable AI (XAI) has established itself as an important component of AI-driven interactive systems. With Augmented Reality (AR) becoming more integrated in daily lives, the role of XAI also becomes essential in AR because end-users will frequently interact with intelligent services. However, it is unclear how to design effective XAI experiences for AR. We propose XAIR, a design framework that addresses "when", "what", and "how" to provide explanations of AI output in AR. The framework was based on a multi-disciplinary literature review of XAI and HCI research, a large-scale survey probing 500+ end-users' preferences for AR-based explanations, and three workshops with 12 experts collecting their insights about XAI design in AR. XAIR's utility and effectiveness was verified via a study with 10 designers and another study with 12 end-users. XAIR can provide guidelines for designers, inspiring them to identify new design opportunities and achieve effective XAI designs in AR.
Progress with Adversarial Attacks part2(Machine Learning)
Abstract: Contrastive vision-language representation learning has achieved state-of-the-art performance for zero-shot classification, by learning from millions of image-caption pairs crawled from the internet. However, the massive data that powers large multimodal models such as CLIP, makes them extremely vulnerable to various types of adversarial attacks, including targeted and backdoor data poisoning attacks. Despite this vulnerability, robust contrastive vision-language pretraining against adversarial attacks has remained unaddressed. In this work, we propose RoCLIP, the first effective method for robust pretraining {and fine-tuning} multimodal vision-language models. RoCLIP effectively breaks the association between poisoned image-caption pairs by considering a pool of random examples, and (1) matching every image with the text that is most similar to its caption in the pool, and (2) matching every caption with the image that is most similar to its image in the pool.
An Authentic Focusing System for 'Cheap' Augmented Reality
Researchers from the Institute of Electrical and Electronics Engineers (IEEE) have developed a method to increase the authenticity of low-cost, projection-based augmented reality installations, through special glasses that cause projected 3D images to go in and out of focus in the same way that they would if the objects were real, overcoming a critical perceptual hurdle for practical usage of projection systems in controlled environments. The IEEE system recreates depth planes for projected real and CGI imagery that will be superimposed into rooms. In this case, three CGI Stanford bunnies are being superimposed at the same depth plane as three real world objects, and their blurriness is controlled by where the viewer is looking and focusing. The system uses electrically focus-tunable lenses (ETL) embedded into the viewer's glasses (which are in any case necessary to separate the two image streams into a convincing, integrated 3D experience), and which communicate with the projection system, which then automatically changes the level of blurriness of the projected image seen by the viewer. The ETL lenses report back information about the user's focal attention and sets the level of blurriness on a per-plane basis for the rendering of the projected geometry.
Fine-grained visual recognition for mobile AR technical support
When a hardware-related system disruption like an outage due to hard drive failure happens, the path to recovery includes checking hardware support information, describing the problem to a support representative, waiting for a field technician to arrive, hoping the technician can resolve the issue in a timely manner. Our team of researchers recently published paper "Fine-Grained Visual Recognition in Mobile Augmented Reality for Technical Support," in IEEE ISMAR 2020[1, 2], which outlines an augmented reality (AR) solution that our colleagues in IBM Technology Support Services (TSS) use to increase the rate of first-time fixes and reduce the mean time to recovery from a hardware disruption. "The most recent industry surveys have shown that the average enterprise estimates that there is an impact of approximately $8,851 for every minute of unplanned downtime in their primary computing environment." By displaying guidance over the physical environment, augmented reality support uses visual guidance to drastically reduce the effort needed to relay instructions, the number of errors and even the time required to look up service information. Technical support service providers typically maintain tens of thousands of products in order to meet the needs of their clients.
Optimal Assistance for Object-Rearrangement Tasks in Augmented Reality
Newman, Benjamin, Carlberg, Kevin, Desai, Ruta
Augmented-reality (AR) glasses that will have access to onboard sensors and an ability to display relevant information to the user present an opportunity to provide user assistance in quotidian tasks. Many such tasks can be characterized as object-rearrangement tasks. We introduce a novel framework for computing and displaying AR assistance that consists of (1) associating an optimal action sequence with the policy of an embodied agent and (2) presenting this sequence to the user as suggestions in the AR system's heads-up display. The embodied agent comprises a "hybrid" between the AR system and the user, with the AR system's observation space (i.e., sensors) and the user's action space (i.e., task-execution actions); its policy is learned by minimizing the task-completion time. In this initial study, we assume that the AR system's observations include the environment's map and localization of the objects and the user. These choices allow us to formalize the problem of computing AR assistance for any object-rearrangement task as a planning problem, specifically as a capacitated vehicle-routing problem. Further, we introduce a novel AR simulator that can enable web-based evaluation of AR-like assistance and associated at-scale data collection via the Habitat simulator for embodied artificial intelligence. Finally, we perform a study that evaluates user response to the proposed form of AR assistance on a specific quotidian object-rearrangement task, house cleaning, using our proposed AR simulator on mechanical turk. In particular, we study the effect of the proposed AR assistance on users' task performance and sense of agency over a range of task difficulties. Our results indicate that providing users with such assistance improves their overall performance and while users report a negative impact to their agency, they may still prefer the proposed assistance to having no assistance at all.
Project Titan Update: Apple Developing AR Displays For Autonomous Car
Apple could be working on AR displays for its upcoming autonomous cars. A patent application by the Cupertino giant reveals details about an AR system that's designed to present 3D models of the road ahead on the windshield. Apple Insider reported Thursday that it has spotted a new patent application by Apple, entitled "Adaptive Vehicle Augmented Reality Display Using Stereographic Imagery." In the documentation that the U.S. Patent & Trademark Office published late last week, it is stated there that Apple is thinking of an AR system that generates imagery of sceneries based on a pre-generated 3D model of the world. According to the Apple-centric news site, the patent could be hinting at an AR display for the upcoming Project Titan car. The possible purpose of having the AR system around is to provide the autonomous vehicle with information about the road ahead, including things that may be out of the driver's vision.
Visual SLAM algorithms: a survey from 2010 to 2016
Simultaneous Localization and Mapping (SLAM) is a technique for obtaining the 3D structure of an unknown environment and sensor motion in the environment. This technique was originally proposed to achieve autonomous control of robots in robotics [1]. Then, SLAM-based applications have widely become broadened such as computer vision-based online 3D modeling, augmented reality (AR)-based visualization, and self-driving cars. In early SLAM algorithms, many different types of sensors were integrated such as laser range sensors, rotary encoders, inertial sensors, GPS, and cameras. Such algorithms are well summarized in the following papers [2, 3, 4, 5].