Goto

Collaborating Authors

 superpoint


Template-free Articulated Gaussian Splatting for Real-time Reposable Dynamic View Synthesis

Neural Information Processing Systems

While novel view synthesis for dynamic scenes has made significant progress, capturing skeleton models of objects and re-posing them remains a challenging task. To tackle this problem, in this paper, we propose a novel approach to automatically discover the associated skeleton model for dynamic objects from videos without the need for object-specific templates. Our approach utilizes 3D Gaussian Splatting and superpoints to reconstruct dynamic objects. Treating superpoints as rigid parts, we can discover the underlying skeleton model through intuitive cues and optimize it using the kinematic model. Besides, an adaptive control strategy is applied to avoid the emergence of redundant superpoints. Extensive experiments demonstrate the effectiveness and efficiency of our method in obtaining re-posable 3D objects. Not only can our approach achieve excellent visual fidelity, but it also allows for the real-time rendering of high-resolution images.




[ Submission 1194: " DISK" ] We thank all reviewers for their insightful comments, and address their concerns

Neural Information Processing Systems

R1: DISK is based on previous work (U-Net, SuperPoint) and only offers moderate innovation. We will clarify this in the paper. We tuned inference parameters (NMS window & RANSAC settings) by search, as described in L194-197. R1, R3, R5: What is the contribution of individual components of the pipeline? Experimentally, we observe that 19.9% of features from grid selection This has three potential downsides.



Consensus Learning with Deep Sets for Essential Matrix Estimation

Neural Information Processing Systems

Robust estimation of the essential matrix, which encodes the relative position and orientation of two cameras, is a fundamental step in structure from motion pipelines. Recent deep-based methods achieved accurate estimation by using complex network architectures that involve graphs, attention layers, and hard pruning steps.




Efficient and Accurate Downfacing Visual Inertial Odometry

Kühne, Jonas, Vogt, Christian, Magno, Michele, Benini, Luca

arXiv.org Artificial Intelligence

This article has been accepted for publication in the IEEE Internet of Things Journal (IoT -J). Personal use of this material is permitted. Abstract--Visual Inertial Odometry (VIO) is a widely used computer vision method that determines an agent's movement through a camera and an IMU sensor . This paper presents an efficient and accurate VIO pipeline optimized for applications on micro-and nano-UA Vs. The proposed design incorporates state-of-the-art feature detection and tracking methods (SuperPoint, PX4FLOW, ORB), all optimized and quantized for emerging RISC-V-based ultra-low-power parallel systems on chips (SoCs). Furthermore, by employing a rigid body motion model, the pipeline reduces estimation errors and achieves improved accuracy in planar motion scenarios. The pipeline's suitability for real-time VIO is assessed on an ultra-low-power SoC in terms of compute requirements and tracking accuracy after quantization. The pipeline, including the three feature tracking methods, was implemented on the SoC for real-world validation. This design bridges the gap between high-accuracy VIO pipelines that are traditionally run on computationally powerful systems and lightweight implementations suitable for microcontrollers. The optimized pipeline on the GAP9 low-power SoC demonstrates an average reduction in RMSE of up to a factor of 3.65x over the baseline pipeline when using the ORB feature tracker . The analysis of the computational complexity of the feature trackers further shows that PX4FLOW achieves on-par tracking accuracy with ORB at a lower runtime for movement speeds below 24 pixels/frame. ISUAL Inertial Odometry (VIO) describes the process of determining an agent's movement through the use of camera and Inertial Measurement Unit (IMU) data [1]. Cameras are used in pure Visual Odometry (VO) to generate a movement estimate from one frame to another by considering the displacement of features or brightness patches between camera images [2]. While stereo VO (i.e., using two cameras) can estimate metric depth information through extrinsic This work was supported by the Swiss National Science Foundation's TinyTrainer project under Grant number 207913.


[ Submission 1194: " DISK" ] We thank all reviewers for their insightful comments, and address their concerns

Neural Information Processing Systems

R1: DISK is based on previous work (U-Net, SuperPoint) and only offers moderate innovation. We will clarify this in the paper. We tuned inference parameters (NMS window & RANSAC settings) by search, as described in L194-197. R1, R3, R5: What is the contribution of individual components of the pipeline? Experimentally, we observe that 19.9% of features from grid selection This has three potential downsides.