Dong, Haiwei
MetaDecorator: Generating Immersive Virtual Tours through Multimodality
Xie, Shuang, Liu, Yang, Lee, Jeannie S. A., Dong, Haiwei
Abstract--MetaDecorator, is a framework that empowers users to personalize virtual spaces. By leveraging text-driven prompts and image synthesis techniques, MetaDecorator adorns static panoramas captured by 360 imaging devices, transforming them into uniquely styled and visually appealing environments. This significantly enhances the realism and engagement of virtual tours compared to traditional offerings. Beyond the core framework, we also discuss the integration of Large Language Models (LLMs) and haptics in the VR application to provide a more immersive experience. This framework shown in FIGURE 1 a significant transformation with the introduction consists of two main stages.
Leveraging LLMs to Create a Haptic Devices' Recommendation System
Liu, Yang, Dong, Haiwei, Saddik, Abdulmotaleb El
Haptic technology has seen significant growth, yet a lack of awareness of existing haptic device design knowledge hinders development. This paper addresses these limitations by leveraging advancements in Large Language Models (LLMs) to develop a haptic agent, focusing specifically on Grounded Force Feedback (GFF) devices recommendation. Our approach involves automating the creation of a structured haptic device database using information from research papers and product specifications. This database enables the recommendation of relevant GFF devices based on user queries. To ensure precise and contextually relevant recommendations, the system employs a dynamic retrieval method that combines both conditional and semantic searches. Benchmarking against the established UEQ and existing haptic device searching tools, the proposed haptic recommendation agent ranks in the top 10\% across all UEQ categories with mean differences favoring the agent in nearly all subscales, and maintains no significant performance bias across different user groups, showcasing superior usability and user satisfaction.
MADRL-Based Rate Adaptation for 360{\deg} Video Streaming with Multi-Viewpoint Prediction
Wang, Haopeng, Long, Zijian, Dong, Haiwei, Saddik, Abdulmotaleb El
Over the last few years, 360{\deg} video traffic on the network has grown significantly. A key challenge of 360{\deg} video playback is ensuring a high quality of experience (QoE) with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate (ABR) streaming based on single viewport prediction to reduce bandwidth consumption. However, the performance of models for single-viewpoint prediction is severely limited by the inherent uncertainty in head movement, which can not cope with the sudden movement of users very well. This paper first presents a multimodal spatial-temporal attention transformer to generate multiple viewpoint trajectories with their probabilities given a historical trajectory. The proposed method models viewpoint prediction as a classification problem and uses attention mechanisms to capture the spatial and temporal characteristics of input video frames and viewpoint trajectories for multi-viewpoint prediction. After that, a multi-agent deep reinforcement learning (MADRL)-based ABR algorithm utilizing multi-viewpoint prediction for 360{\deg} video streaming is proposed for maximizing different QoE objectives under various network conditions. We formulate the ABR problem as a decentralized partially observable Markov decision process (Dec-POMDP) problem and present a MAPPO algorithm based on centralized training and decentralized execution (CTDE) framework to solve the problem. The experimental results show that our proposed method improves the defined QoE metric by up to 85.5% compared to existing ABR methods.
Bringing Robots Home: The Rise of AI Robots in Consumer Electronics
Dong, Haiwei, Liu, Yang, Chu, Ted, Saddik, Abdulmotaleb El
On March 18, 2024, NVIDIA unveiled Project GR00T, a general-purpose multimodal generative AI model designed specifically for training humanoid robots. Preceding this event, Tesla's unveiling of the Optimus Gen 2 humanoid robot on December 12, 2023, underscored the profound impact robotics is poised to have on reshaping various facets of our daily lives. While robots have long dominated industrial settings, their presence within our homes is a burgeoning phenomenon. This can be attributed, in part, to the complexities of domestic environments and the challenges of creating robots that can seamlessly integrate into our daily routines.
Human-Centric Resource Allocation for the Metaverse With Multiaccess Edge Computing
Long, Zijian, Dong, Haiwei, Saddik, Abdulmotaleb El
Multi-access edge computing (MEC) is a promising solution to the computation-intensive, low-latency rendering tasks of the metaverse. However, how to optimally allocate limited communication and computation resources at the edge to a large number of users in the metaverse is quite challenging. In this paper, we propose an adaptive edge resource allocation method based on multi-agent soft actor-critic with graph convolutional networks (SAC-GCN). Specifically, SAC-GCN models the multi-user metaverse environment as a graph where each agent is denoted by a node. Each agent learns the interplay between agents by graph convolutional networks with self-attention mechanism to further determine the resource usage for one user in the metaverse. The effectiveness of SAC-GCN is demonstrated through the analysis of user experience, balance of resource allocation, and resource utilization rate by taking a virtual city park metaverse as an example. Experimental results indicate that SAC-GCN outperforms other resource allocation methods in improving overall user experience, balancing resource allocation, and increasing resource utilization rate by at least 27%, 11%, and 8%, respectively.
A Deep Reinforcement Learning Framework for Optimizing Congestion Control in Data Centers
Ketabi, Shiva, Chen, Hongkai, Dong, Haiwei, Ganjali, Yashar
Various congestion control protocols have been designed to achieve high performance in different network environments. Modern online learning solutions that delegate the congestion control actions to a machine cannot properly converge in the stringent time scales of data centers. We leverage multiagent reinforcement learning to design a system for dynamic tuning of congestion control parameters at end-hosts in a data center. The system includes agents at the end-hosts to monitor and report the network and traffic states, and agents to run the reinforcement learning algorithm given the states. Based on the state of the environment, the system generates congestion control parameters that optimize network performance metrics such as throughput and latency. As a case study, we examine BBR, an example of a prominent recently-developed congestion control protocol. Our experiments demonstrate that the proposed system has the potential to mitigate the problems of static parameters.
Development of an automatic 3D human head scanning-printing system
Zhang, Longyu, Han, Bote, Dong, Haiwei, Saddik, Abdulmotaleb El
In anthropological studies, researchers have been investigating the relationship between facial shape variations and neurological and psychiatric disorders. For example, Hennesy et al. used 3D head models acquired from laser scanners to identify schizophrenia from facial dysmorphic features [3]. A fast algorithm for 3D face reconstruction with uncalibrated photometric stereo technology was also proposed by Qi et al. [4]. Human avatar animation has also become popular with the development of 3D graphics and gaming. Lee and Magnenat-Thalman introduced a method to reconstruct 3D facial models for animation from two orthogonal images (frontal and profile view) or from range data [5]. Additionally, Kan and Ferko adopted this same principle to build an automatic system where they use the facial feature matching of two images and a parametrized head model to create 3D head models as avatars in 3D games [6]. An important part of 3D human model is head model, which can be used to establish standards for the design of products that fit onto the face or head, such as respiratory masks, glasses, helmets or other head-mounted devices [7]. An interesting initiative was the Size-China project [8,9]. To find the proper fit for Asians, who have different head shapes compared with Westerners in facialhead products such as helmets, face masks, and caps, and to derive standards with anthropometric database, Ball et al. created an Asian anthropometric database built from 3D scans of 2000 Asian people using a stationary head and face color 3D scanner by Cyberware
Development of a Self-Calibrated Motion Capture System by Nonlinear Trilateration of Multiple Kinects v2
Yang, Bowen, Dong, Haiwei, Saddik, Abdulmotaleb El
In this paper, a Kinect-based distributed and real-time motion capture system is developed. A trigonometric method is applied to calculate the relative position of Kinect v2 sensors with a calibration wand and register the sensors' positions automatically. By combining results from multiple sensors with a nonlinear least square method, the accuracy of the motion capture is optimized. Moreover, to exclude inaccurate results from sensors, a computational geometry is applied in the occlusion approach, which discovers occluded joint data. The synchronization approach is based on an NTP protocol that synchronizes the time between the clocks of a server and clients dynamically, ensuring that the proposed system is a real-time system. Experiments for validating the proposed system are conducted from the perspective of calibration, occlusion, accuracy, and efficiency. Furthermore, to demonstrate the practical performance of our system, a comparison of previously developed motion capture systems (the linear trilateration approach and the geometric trilateration approach) with the benchmark OptiTrack system is conducted, therein showing that the accuracy of our proposed system is $38.3\%$ and 24.1% better than the two aforementioned trilateration systems, respectively.
Sitting Posture Recognition Using a Spiking Neural Network
Wang, Jianquan, Hafidh, Basim, Dong, Haiwei, Saddik, Abdulmotaleb El
Abstract--To increase the quality of citizens' lives, we designed They use a recurrent network structure so that the intermediate I. That improve the quality of life of citizens in smart cities. One of is, the membrane potential of the neurons is related to the the visions towards smart cities is digital twins [1], which are a quantity, frequency and interval of the input spikes and not replica of any living or nonliving entity. In this paper, a spiking neural network is Burden of Disease (GBD) study [2], increasingly more people constructed in the form of a liquid state machine. The purpose are suffering from lower back pain among other conditions of this work was to design, implement, and validate a sensing due to inappropriate sitting behaviors. To improve quality of chair system for computer-human interactions with the spiking life, it is essential to design personalized sensing systems neural network.