AITopics

Country: Europe (0.46)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Neural Information Processing SystemsFeb-16-2026, 13:38:50 GMT

Diversifying Spatial-Temporal Perception for Video Domain Generalization Kun-Y u Lin

However, existing video classification models rely on the i.i.d.

artificial intelligence, computer vision, machine learning, (16 more...)

Country:

Asia > China (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.36)

Neural Information Processing SystemsFeb-14-2026, 13:55:08 GMT

5c54e016197805946481d786d80a662e-Paper-Conference.pdf

correlation, machine learning, natural language, (18 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)
Asia > China > Zhejiang Province (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Neural Information Processing SystemsFeb-11-2026, 04:30:41 GMT

456f9445d0fa1a932d19584ab788c787-Paper-Conference.pdf

detection, information, spatial feature, (15 more...)

Country:

Oceania > Australia (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)

Neural Information Processing SystemsFeb-9-2026, 08:26:58 GMT

Self-SupervisedMulti-ObjectTrackingwithCross-InputConsistency (SupplementaryMaterial) FavyenBastani,Songtao He,SamMadden

For each training sequence hI0,...,Ini, Only-Occlusion randomly selects four indexes 0 < k1 k2 < k3 k4 < n to construct two disjoint frame subsequences hIk1,...,Ik2i and hIk3,...,Ik4i. Learning to merely compare detection features across consecutive frames would yield low accuracy since features in occluded frames are not observed. This strategy yields high consistency because it is unaffected by occluded intermediate frames. We select two indexes 0 < k5,k6 < n. Then, we randomly pick k5 and k6 such that k3 k5 k4 and k1 k6 k2, i.e., the hand-off for one tracker occurs when the other tracker observes a simulated occlusion.

artificial intelligence, machine learning, supplementarymaterial, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (0.35)
Information Technology > Artificial Intelligence > Machine Learning (0.32)

arXiv.org Artificial IntelligenceDec-5-2025

Bridging Simulation and Reality: Cross-Domain Transfer with Semantic 2D Gaussian Splatting

Tang, Jian, Pang, Pu, Sun, Haowen, Ma, Chengzhong, Chen, Xingyu, Huang, Hua, Lan, Xuguang

Cross-domain transfer in robotic manipulation remains a longstanding challenge due to the significant domain gap between simulated and real-world environments. Existing methods such as domain randomization, adaptation, and sim-real calibration often require extensive tuning or fail to generalize to unseen scenarios. To address this issue, we observe that if domain-invariant features are utilized during policy training in simulation, and the same features can be extracted and provided as the input to policy during real-world deployment, the domain gap can be effectively bridged, leading to significantly improved policy generalization. Accordingly, we propose Semantic 2D Gaussian Splatting (S2GS), a novel representation method that extracts object-centric, domain-invariant spatial features. S2GS constructs multi-view 2D semantic fields and projects them into a unified 3D space via feature-level Gaussian splatting. A semantic filtering mechanism removes irrelevant background content, ensuring clean and consistent inputs for policy learning. To evaluate the effectiveness of S2GS, we adopt Diffusion Policy as the downstream learning algorithm and conduct experiments in the ManiSkill simulation environment, followed by real-world deployment. Results demonstrate that S2GS significantly improves sim-to-real transferability, maintaining high and stable task performance in real-world scenarios.

artificial intelligence, machine learning, natural language, (19 more...)

2512.04731

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceDec-1-2025

Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory

Wu, Yuqi, Zheng, Wenzhao, Zhou, Jie, Lu, Jiwen

Dense 3D scene reconstruction from an ordered sequence or unordered image collections is a critical step when bringing research in computer vision into practical scenarios. Following the paradigm introduced by DUSt3R, which unifies an image pair densely into a shared coordinate system, subsequent methods maintain an implicit memory to achieve dense 3D reconstruction from more images. However, such implicit memory is limited in capacity and may suffer from information loss of earlier frames. We propose Point3R, an online framework targeting dense streaming 3D reconstruction. To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene. Each pointer in this memory is assigned a specific 3D position and aggregates scene information nearby in the global coordinate system into a changing spatial feature. Information extracted from the latest frame interacts explicitly with this pointer memory, enabling dense integration of the current observation into the global coordinate system. We design a 3D hierarchical position embedding to promote this interaction and design a simple yet effective fusion mechanism to ensure that our pointer memory is uniform and efficient. Our method achieves competitive or state-of-the-art performance on various tasks with low training costs.

artificial intelligence, machine learning, spatial reasoning, (18 more...)

2507.02863

Country: Asia (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.35)

Liu, Zesheng, Rahnemoonfar, Maryam

GRIT-LP: Graph Transformer with Long-Range Skip Connection and Partitioned Spatial Graphs for Accurate Ice Layer Thickness Prediction

arXiv.org Artificial IntelligenceNov-25-2025

Graph transformers have demonstrated remarkable capability on complex spatio-temporal tasks, yet their depth is often limited by oversmoothing and weak long-range dependency modeling. To address these challenges, we introduce GRIT -LP, a graph transformer explicitly designed for polar ice-layer thickness estimation from polar radar imagery. Accurately estimating ice layer thickness is critical for understanding snow accumulation, reconstructing past climate patterns and reducing uncertainties in projections of future ice sheet evolution and sea level rise. GRIT -LP combines an inductive geometric graph learning framework with self-attention mechanism, and introduces two major innovations that jointly address challenges in modeling the spatio-temporal patterns of ice layers: a partitioned spatial graph construction strategy that forms overlapping, fully connected local neighborhoods to preserve spatial coherence and suppress noise from irrelevant long-range links, and a long-range skip connection mechanism within the transformer that improves information flow and mitigates oversmooth-ing in deeper attention layers. We conducted extensive experiments, demonstrating that GRIT -LP outperforms current state-of-the-art methods with a 24.92% improvement in root mean squared error. These results highlight the effectiveness of graph transformers in modeling spatiotemporal patterns by capturing both localized structural features and long-range dependencies across internal ice layers, and demonstrate their potential to advance data-driven understanding of cryospheric processes. Introduction Graph transformers have proven to be highly effective for modeling complex graph-structured data, with wide-range of applications in real-world scenarios, particularly those involving spatiotemporal patterns. Their ability to capture intricate relationships and dependencies makes them highly valuable in domains such as pedestrian trajectory prediction [1] and traffic prediction [2]. Despite their success, current graph transformer architectures face notable limitations, including overfitting and over-smoothing--a phenomenon where node features become indistinguishable as layers deepen [3]. Additionally, many existing graph transformers are relatively shallow, limiting their ability to effectively capture the complex, long-range dependencies that often emerge in real-world datasets.

artificial intelligence, machine learning, spatial reasoning, (20 more...)

2511.18716

Country: North America > United States (0.68)

Genre: Research Report > Promising Solution (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.94)

arXiv.org Artificial IntelligenceOct-13-2025

EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation

Zhu, Tianheng, Yu, Yinfeng, Wang, Liejun, Sun, Fuchun, Zheng, Wendong

This paper presents EGSTalker, a real-time audio-driven talking head generation framework based on 3D Gaussian Splatting (3DGS). Designed to enhance both speed and visual fidelity, EGSTalker requires only 3-5 minutes of training video to synthesize high-quality facial animations. The framework comprises two key stages: static Gaussian initialization and audio-driven deformation. In the first stage, a multi-resolution hash triplane and a Kolmogorov-Arnold Network (KAN) are used to extract spatial features and construct a compact 3D Gaussian representation. In the second stage, we propose an Efficient Spatial-Audio Attention (ESAA) module to fuse audio and spatial cues, while KAN predicts the corresponding Gaussian deformations. Extensive experiments demonstrate that EGSTalker achieves rendering quality and lip-sync accuracy comparable to state-of-the-art methods, while significantly outperforming them in inference speed. These results highlight EGSTalker's potential for real-time multimedia applications.

machine learning, natural language, real time system, (18 more...)

2510.08587

Country: Asia > China (0.29)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Architecture > Real Time Systems (0.93)
(3 more...)