Wang, Ti
TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM
Jiang, Peifeng, Liu, Hong, Li, Xia, Wang, Ti, Zhang, Fabian, Buhmann, Joachim M.
The limited robustness of 3D Gaussian Splatting (3DGS) to motion blur and camera noise, along with its poor real-time performance, restricts its application in robotic SLAM tasks. Upon analysis, the primary causes of these issues are the density of views with motion blur and the cumulative errors in dense pose estimation from calculating losses based on noisy original images and rendering results, which increase the difficulty of 3DGS rendering convergence. Thus, a cutting-edge 3DGS-based SLAM system is introduced, leveraging the efficiency and flexibility of 3DGS to achieve real-time performance while remaining robust against sensor noise, motion blur, and the challenges posed by long-session SLAM. Central to this approach is the Fusion Bridge module, which seamlessly integrates tracking-centered ORB Visual Odometry with mapping-centered online 3DGS. Precise pose initialization is enabled by this module through joint optimization of re-projection and rendering loss, as well as strategic view selection, enhancing rendering convergence in large-scale scenes. Extensive experiments demonstrate state-of-the-art rendering quality and localization accuracy, positioning this system as a promising solution for real-world robotics applications that require stable, near-real-time performance. Our project is available at https://ZeldaFromHeaven.github.io/TAMBRIDGE/
SGRU: A High-Performance Structured Gated Recurrent Unit for Traffic Flow Prediction
Zhang, Wenfeng, Li, Xin, Li, Anqi, Huang, Xiaoting, Wang, Ti, Gao, Honglei
By incorporating attention The scenario of multivariate time series occurs in various mechanisms into GCN, it is possible to distinguish the domains of life. Researchers utilize historical weather data importance of different nodes and utilize Gated Recurrent from different regions to predict future rainfall intensity [1]. Units (GRU) for frequency domain feature extraction on long Taxi data is used to help cities predict travel resources time series, leading to a substantial improvement in prediction and reduce traffic congestion [2]. Exception monitoring is accuracy [7]. The aforementioned work demonstrates the performed on various states in manufacturing systems and effectiveness of GCN in learning spatial features and its strong internet services [3]. Since the introduction of Graph Convolutional adaptability to non-Euclidean structured data. Networks (GCN) in 2017 [4], this method has been widely used in the field of Multivariate Time Series (MTS) A. Problem 1: Dilated convolutions break adjacent time steps for spatial semi-supervised and self-supervised learning. By Focusing on the direction of traffic flow prediction (Figure interleaving one-dimensional convolution with gated linear 1,2), recent Temporal Convolutional Networks [8] and units (GLU) and graph convolution, and appending an output Graph WaveNet (GWNet) [9] have adopted the mechanism layer after this "sandwich" structure [5], accurate prediction of GCN. However, in the frequency domain, they use dilated of traffic flow speed can be achieved.
Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation
Wang, Ti, Liu, Mengyuan, Liu, Hong, Ren, Bin, You, Yingxuan, Li, Wenhao, Sebe, Nicu, Li, Xia
Although data-driven methods have achieved success in 3D human pose estimation, they often suffer from domain gaps and exhibit limited generalization. In contrast, optimization-based methods excel in fine-tuning for specific cases but are generally inferior to data-driven methods in overall performance. We observe that previous optimization-based methods commonly rely on projection constraint, which only ensures alignment in 2D space, potentially leading to the overfitting problem. To address this, we propose an Uncertainty-Aware testing-time Optimization (UAO) framework, which keeps the prior information of pre-trained model and alleviates the overfitting problem using the uncertainty of joints. Specifically, during the training phase, we design an effective 2D-to-3D network for estimating the corresponding 3D pose while quantifying the uncertainty of each 3D joint. For optimization during testing, the proposed optimization framework freezes the pre-trained model and optimizes only a latent state. Projection loss is then employed to ensure the generated poses are well aligned in 2D space for high-quality optimization. Furthermore, we utilize the uncertainty of each joint to determine how much each joint is allowed for optimization. The effectiveness and superiority of the proposed framework are validated through extensive experiments on two challenging datasets: Human3.6M and MPI-INF-3DHP. Notably, our approach outperforms the previous best result by a large margin of 4.5% on Human3.6M. Our source code will be open-sourced.
Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video
You, Yingxuan, Liu, Hong, Wang, Ti, Li, Wenhao, Ding, Runwei, Li, Xia
Despite significant progress in single image-based 3D human mesh recovery, accurately and smoothly recovering 3D human motion from a video remains challenging. Existing video-based methods generally recover human mesh by estimating the complex pose and shape parameters from coupled image features, whose high complexity and low representation ability often result in inconsistent pose motion and limited shape patterns. To alleviate this issue, we introduce 3D pose as the intermediary and propose a Pose and Mesh Co-Evolution network (PMCE) that decouples this task into two parts: 1) video-based 3D human pose estimation and 2) mesh vertices regression from the estimated 3D pose and temporal image feature. Specifically, we propose a two-stream encoder that estimates mid-frame 3D pose and extracts a temporal image feature from the input image sequence. In addition, we design a co-evolution decoder that performs pose and mesh interactions with the image-guided Adaptive Layer Normalization (AdaLN) to make pose and mesh fit the human body shape. Extensive experiments demonstrate that the proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency on three benchmark datasets: 3DPW, Human3.6M, and MPI-INF-3DHP. Our code is available at https://github.com/kasvii/PMCE.
Interweaved Graph and Attention Network for 3D Human Pose Estimation
Wang, Ti, Liu, Hong, Ding, Runwei, Li, Wenhao, You, Yingxuan, Li, Xia
Despite substantial progress in 3D human pose estimation from a single-view image, prior works rarely explore global and local correlations, leading to insufficient learning of human skeleton representations. To address this issue, we propose a novel Interweaved Graph and Attention Network (IGANet) that allows bidirectional communications between graph convolutional networks (GCNs) and attentions. Specifically, we introduce an IGA module, where attentions are provided with local information from GCNs and GCNs are injected with global information from attentions. Additionally, we design a simple yet effective U-shaped multi-layer perceptron (uMLP), which can capture multi-granularity information for body joints. Extensive experiments on two popular benchmark datasets (i.e. Human3.6M and MPI-INF-3DHP) are conducted to evaluate our proposed method.The results show that IGANet achieves state-of-the-art performance on both datasets. Code is available at https://github.com/xiu-cs/IGANet.
GATOR: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from a 2D Pose
You, Yingxuan, Liu, Hong, Li, Xia, Li, Wenhao, Wang, Ti, Ding, Runwei
3D human mesh recovery from a 2D pose plays an important role in various applications. However, it is hard for existing methods to simultaneously capture the multiple relations during the evolution from skeleton to mesh, including joint-joint, joint-vertex and vertex-vertex relations, which often leads to implausible results. To address this issue, we propose a novel solution, called GATOR, that contains an encoder of Graph-Aware Transformer (GAT) and a decoder with Motion-Disentangled Regression (MDR) to explore these multiple relations. Specifically, GAT combines a GCN and a graph-aware self-attention in parallel to capture physical and hidden joint-joint relations. Furthermore, MDR models joint-vertex and vertex-vertex interactions to explore joint and vertex relations. Based on the clustering characteristics of vertex offset fields, MDR regresses the vertices by composing the predicted base motions. Extensive experiments show that GATOR achieves state-of-the-art performance on two challenging benchmarks.