Dai, Weichen
SLC$^2$-SLAM: Semantic-guided Loop Closure with Shared Latent Code for NeRF SLAM
Ming, Yuhang, Ma, Di, Dai, Weichen, Yang, Han, Fan, Rui, Zhang, Guofeng, Kong, Wanzeng
Targeting the notorious cumulative drift errors in NeRF SLAM, we propose a Semantic-guided Loop Closure with Shared Latent Code, dubbed SLC$^2$-SLAM. Especially, we argue that latent codes stored in many NeRF SLAM systems are not fully exploited, as they are only used for better reconstruction. In this paper, we propose a simple yet effective way to detect potential loops using the same latent codes as local features. To further improve the loop detection performance, we use the semantic information, which are also decoded from the same latent codes to guide the aggregation of local features. Finally, with the potential loops detected, we close them with a graph optimization followed by bundle adjustment to refine both the estimated poses and the reconstructed scene. To evaluate the performance of our SLC$^2$-SLAM, we conduct extensive experiments on Replica and ScanNet datasets. Our proposed semantic-guided loop closure significantly outperforms the pre-trained NetVLAD and ORB combined with Bag-of-Words, which are used in all the other NeRF SLAM with loop closure. As a result, our SLC$^2$-SLAM also demonstrated better tracking and reconstruction performance, especially in larger scenes with more loops, like ScanNet.
KALE-LM: Unleash The Power Of AI For Science Via Knowledge And Logic Enhanced Large Model
Dai, Weichen, Chen, Yezeng, Dai, Zijie, Huang, Zhijie, Liu, Yubo, Pan, Yixuan, Song, Baiyang, Zhong, Chengli, Li, Xinhe, Wang, Zeyu, Feng, Zhuoying, Zhou, Yi
In recent years, the rapid development of artificial intelligence (AI) technology has enabled it to achieve, and in some cases surpass, top human performance in various high-intelligence tasks. These include recognition in speech [1], facial [2], and image [3], games such as Go [4], StarCraft [5], and Dota2 [6], as well as tasks related to text [7], image [8], and video generation, machine translation [9], knowledge-based question answering [10], debates, and solving advanced mathematical problems [11]. Science is one of the most important fields for the application of AI. As the crown jewel of human civilization and the cornerstone of various industries, science is a core driver of human progress, and its development can significantly accelerate and even revolutionize many fields. Historically, there have been three major research paradigms in science: the first paradigm, experiment, which emerged from Newtonian empiricism; the second paradigm, theory, born from Einstein's rationalism; and the third paradigm, simulation/computation, which arose from the third industrial revolution, the computation and information revolution.
AEGIS-Net: Attention-guided Multi-Level Feature Aggregation for Indoor Place Recognition
Ming, Yuhang, Ma, Jian, Yang, Xingrui, Dai, Weichen, Peng, Yong, Kong, Wanzeng
We present AEGIS-Net, a novel indoor place recognition model that takes in RGB point clouds and generates global place descriptors by aggregating lower-level color, geometry features and higher-level implicit semantic features. However, rather than simple feature concatenation, self-attention modules are employed to select the most important local features that best describe an indoor place. Our AEGIS-Net is made of a semantic encoder, a semantic decoder and an attention-guided feature embedding. The model is trained in a 2-stage process with the first stage focusing on an auxiliary semantic segmentation task and the second one on the place recognition task. We evaluate our AEGIS-Net on the ScanNetPR dataset and compare its performance with a pre-deep-learning feature-based method and five state-of-the-art deep-learning-based methods. Our AEGIS-Net achieves exceptional performance and outperforms all six methods.