spatial feature
- Asia > China (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.56)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Vision > Video Understanding (0.36)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)
- Asia > China > Zhejiang Province (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.68)
- Oceania > Australia (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)
Self-SupervisedMulti-ObjectTrackingwithCross-InputConsistency (SupplementaryMaterial) FavyenBastani,Songtao He,SamMadden
For each training sequence hI0,...,Ini, Only-Occlusion randomly selects four indexes 0 < k1 k2 < k3 k4 < n to construct two disjoint frame subsequences hIk1,...,Ik2i and hIk3,...,Ik4i. Learning to merely compare detection features across consecutive frames would yield low accuracy since features in occluded frames are not observed. This strategy yields high consistency because it is unaffected by occluded intermediate frames. We select two indexes 0 < k5,k6 < n. Then, we randomly pick k5 and k6 such that k3 k5 k4 and k1 k6 k2, i.e., the hand-off for one tracker occurs when the other tracker observes a simulated occlusion.
Bridging Simulation and Reality: Cross-Domain Transfer with Semantic 2D Gaussian Splatting
Tang, Jian, Pang, Pu, Sun, Haowen, Ma, Chengzhong, Chen, Xingyu, Huang, Hua, Lan, Xuguang
Cross-domain transfer in robotic manipulation remains a longstanding challenge due to the significant domain gap between simulated and real-world environments. Existing methods such as domain randomization, adaptation, and sim-real calibration often require extensive tuning or fail to generalize to unseen scenarios. To address this issue, we observe that if domain-invariant features are utilized during policy training in simulation, and the same features can be extracted and provided as the input to policy during real-world deployment, the domain gap can be effectively bridged, leading to significantly improved policy generalization. Accordingly, we propose Semantic 2D Gaussian Splatting (S2GS), a novel representation method that extracts object-centric, domain-invariant spatial features. S2GS constructs multi-view 2D semantic fields and projects them into a unified 3D space via feature-level Gaussian splatting. A semantic filtering mechanism removes irrelevant background content, ensuring clean and consistent inputs for policy learning. To evaluate the effectiveness of S2GS, we adopt Diffusion Policy as the downstream learning algorithm and conduct experiments in the ManiSkill simulation environment, followed by real-world deployment. Results demonstrate that S2GS significantly improves sim-to-real transferability, maintaining high and stable task performance in real-world scenarios.
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
Wu, Yuqi, Zheng, Wenzhao, Zhou, Jie, Lu, Jiwen
Dense 3D scene reconstruction from an ordered sequence or unordered image collections is a critical step when bringing research in computer vision into practical scenarios. Following the paradigm introduced by DUSt3R, which unifies an image pair densely into a shared coordinate system, subsequent methods maintain an implicit memory to achieve dense 3D reconstruction from more images. However, such implicit memory is limited in capacity and may suffer from information loss of earlier frames. We propose Point3R, an online framework targeting dense streaming 3D reconstruction. To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene. Each pointer in this memory is assigned a specific 3D position and aggregates scene information nearby in the global coordinate system into a changing spatial feature. Information extracted from the latest frame interacts explicitly with this pointer memory, enabling dense integration of the current observation into the global coordinate system. We design a 3D hierarchical position embedding to promote this interaction and design a simple yet effective fusion mechanism to ensure that our pointer memory is uniform and efficient. Our method achieves competitive or state-of-the-art performance on various tasks with low training costs.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.35)
GRIT-LP: Graph Transformer with Long-Range Skip Connection and Partitioned Spatial Graphs for Accurate Ice Layer Thickness Prediction
Liu, Zesheng, Rahnemoonfar, Maryam
Graph transformers have demonstrated remarkable capability on complex spatio-temporal tasks, yet their depth is often limited by oversmoothing and weak long-range dependency modeling. To address these challenges, we introduce GRIT -LP, a graph transformer explicitly designed for polar ice-layer thickness estimation from polar radar imagery. Accurately estimating ice layer thickness is critical for understanding snow accumulation, reconstructing past climate patterns and reducing uncertainties in projections of future ice sheet evolution and sea level rise. GRIT -LP combines an inductive geometric graph learning framework with self-attention mechanism, and introduces two major innovations that jointly address challenges in modeling the spatio-temporal patterns of ice layers: a partitioned spatial graph construction strategy that forms overlapping, fully connected local neighborhoods to preserve spatial coherence and suppress noise from irrelevant long-range links, and a long-range skip connection mechanism within the transformer that improves information flow and mitigates oversmooth-ing in deeper attention layers. We conducted extensive experiments, demonstrating that GRIT -LP outperforms current state-of-the-art methods with a 24.92% improvement in root mean squared error. These results highlight the effectiveness of graph transformers in modeling spatiotemporal patterns by capturing both localized structural features and long-range dependencies across internal ice layers, and demonstrate their potential to advance data-driven understanding of cryospheric processes. Introduction Graph transformers have proven to be highly effective for modeling complex graph-structured data, with wide-range of applications in real-world scenarios, particularly those involving spatiotemporal patterns. Their ability to capture intricate relationships and dependencies makes them highly valuable in domains such as pedestrian trajectory prediction [1] and traffic prediction [2]. Despite their success, current graph transformer architectures face notable limitations, including overfitting and over-smoothing--a phenomenon where node features become indistinguishable as layers deepen [3]. Additionally, many existing graph transformers are relatively shallow, limiting their ability to effectively capture the complex, long-range dependencies that often emerge in real-world datasets.
- North America > Greenland (0.04)
- North America > United States > Kansas (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation
Zhu, Tianheng, Yu, Yinfeng, Wang, Liejun, Sun, Fuchun, Zheng, Wendong
This paper presents EGSTalker, a real-time audio-driven talking head generation framework based on 3D Gaussian Splatting (3DGS). Designed to enhance both speed and visual fidelity, EGSTalker requires only 3-5 minutes of training video to synthesize high-quality facial animations. The framework comprises two key stages: static Gaussian initialization and audio-driven deformation. In the first stage, a multi-resolution hash triplane and a Kolmogorov-Arnold Network (KAN) are used to extract spatial features and construct a compact 3D Gaussian representation. In the second stage, we propose an Efficient Spatial-Audio Attention (ESAA) module to fuse audio and spatial cues, while KAN predicts the corresponding Gaussian deformations. Extensive experiments demonstrate that EGSTalker achieves rendering quality and lip-sync accuracy comparable to state-of-the-art methods, while significantly outperforming them in inference speed. These results highlight EGSTalker's potential for real-time multimedia applications.
- Asia > China > Tianjin Province > Tianjin (0.05)
- North America > United States (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models Zhengyang Yu1 Zhaoyuan Yang
However, such a solution often shows unsatisfactory editability on the source image. To address this, we propose DreamSteerer, a plug-in method for augmenting existing T2I personalization methods. Specifically, we enhance the source image conditioned editability of a personalized diffusion model via a novel Editability Driven Score Distillation (EDSD) objective.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)