Goto

Collaborating Authors

 ineccv


Explicit Spati

Neural Information Processing Systems

Dense 3D scene reconstruction from an ordered sequence or unordered image collections is a critical step when bringing research in computer vision into practical scenarios. Following the paradigm introduced by DUSt3R, which unifies an image pair densely into a shared coordinate system, subsequent methods maintain an implicit memory to achieve dense 3D reconstruction from more images. However, such implicit memory is limited in capacity and may suffer from information loss of earlier frames. We propose Point3R, an online framework targeting dense streaming 3D reconstruction. To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene. Each pointer in this memory is assigned a specific 3D position and aggregates scene information nearby in the global coordinate system into a changing spatial feature. Information extracted from the latest frame interacts explicitly with this pointer memory, enabling dense integration of the current observation into the global coordinate system. We design a 3D hierarchical position embedding to promote this interaction and design a simple yet effective fusion mechanism to ensure that our pointer memory is uniform and efficient. Our method achieves competitive or state-of-the-art performance on various tasks with low training costs.


DirectMulti-viewMulti-person3DPoseEstimation

Neural Information Processing Systems

Multi-view multi-person 3D pose estimation aims to localize 3D skeleton joints for each person instance in a scene from multi-view camera inputs. It is a fundamental task that benefits many real-world applications (such assurveillance, sportscast, gaming and mixed reality) and ismainly tackled byreconstruction-based [6,14,4]andvolumetric [40]approaches inpreviousliterature, as showninFig.1(a)and(b).


64f1f27bf1b4ec22924fd0acb550c235-Paper.pdf

Neural Information Processing Systems

The proposed MLP decoder aggregates information from different layers, andthus combining both local attention and global attention to render powerful representations.






Self-Erasing Network for Integral Object Attention

Neural Information Processing Systems

To tackle such an issue as well as promote the quality of object attention, we introduce asimple yet effectiveSelfErasing Network (SeeNet) to prohibit attentions from spreading to unexpected background regions.