Plotting

 Wang, Chen


KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph

arXiv.org Artificial Intelligence

Large language model (LLM) has achieved outstanding performance on various downstream tasks with its powerful natural language understanding and zero-shot capability, but LLM still suffers from knowledge limitation. Especially in scenarios that require long logical chains or complex reasoning, the hallucination and knowledge limitation of LLM limit its performance in question answering (QA). In this paper, we propose a novel framework KnowledgeNavigator to address these challenges by efficiently and accurately retrieving external knowledge from knowledge graph and using it as a key factor to enhance LLM reasoning. Specifically, KnowledgeNavigator first mines and enhances the potential constraints of the given question to guide the reasoning. Then it retrieves and filters external knowledge that supports answering through iterative reasoning on knowledge graph with the guidance of LLM and the question. Finally, KnowledgeNavigator constructs the structured knowledge into effective prompts that are friendly to LLM to help its reasoning. We evaluate KnowledgeNavigator on multiple public KGQA benchmarks, the experiments show the framework has great effectiveness and generalization, outperforming previous knowledge graph enhanced LLM methods and is comparable to the fully supervised models.


SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments

arXiv.org Artificial Intelligence

Simultaneous localization and mapping (SLAM) is a fundamental task for numerous applications such as autonomous navigation and exploration. Despite many SLAM datasets have been released, current SLAM solutions still struggle to have sustained and resilient performance. One major issue is the absence of high-quality datasets including diverse all-weather conditions and a reliable metric for assessing robustness. This limitation significantly restricts the scalability and generalizability of SLAM technologies, impacting their development, validation, and deployment. To address this problem, we present SubT-MRS, an extremely challenging real-world dataset designed to push SLAM towards all-weather environments to pursue the most robust SLAM performance. It contains multi-degraded environments including over 30 diverse scenes such as structureless corridors, varying lighting conditions, and perceptual obscurants like smoke and dust; multimodal sensors such as LiDAR, fisheye camera, IMU, and thermal camera; and multiple locomotions like aerial, legged, and wheeled robots. We develop accuracy and robustness evaluation tracks for SLAM and introduced novel robustness metrics. Comprehensive studies are performed, revealing new observations, challenges, and opportunities for future research.


Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

arXiv.org Artificial Intelligence

Building general-purpose robots that can operate seamlessly, in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. Unfortunately, however, most existing robotic systems have been constrained - having been designed for specific tasks, trained on specific datasets, and deployed within specific environments. These systems usually require extensively-labeled data, rely on task-specific models, have numerous generalization issues when deployed in real-world scenarios, and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like. We begin by providing an overview of what constitutes a conventional robotic system and the fundamental barriers to making it universally applicable. Next, we establish a taxonomy to discuss current work exploring ways to leverage existing foundation models for robotics and develop ones catered to robotics. Finally, we discuss key challenges and promising future directions in using foundation models for enabling general-purpose robotic systems. We encourage readers to view our living GitHub repository of resources, including papers reviewed in this survey as well as related projects and repositories for developing foundation models for robotics.


iSLAM: Imperative SLAM

arXiv.org Artificial Intelligence

Simultaneous Localization and Mapping (SLAM) stands as one of the critical challenges in robot navigation. A SLAM system often consists of a front-end component for motion estimation and a back-end system for eliminating estimation drift. Recent advancements suggest that data-driven methods are highly effective for front-end tasks, while geometry-based methods continue to be essential in the back-end processes. However, such a decoupled paradigm between the data-driven front-end and geometry-based back-end can lead to sub-optimal performance, consequently reducing system capabilities and generalization potential. To solve this problem, we proposed a novel self-supervised imperative learning framework, named imperative SLAM (iSLAM), which fosters reciprocal correction between the front-end and back-end, thus enhancing performance without necessitating any external supervision. Specifically, we formulate the SLAM problem as a bilevel optimization so that the front-end and back-end are bidirectionally connected. As a result, the front-end model can learn global geometric knowledge obtained through pose graph optimization by back-propagating the residuals from the back-end component. We showcase the effectiveness of this new framework through an application of stereo-inertial SLAM. The experiments show that the iSLAM training strategy achieves an accuracy improvement of 22% on average over a baseline model. To the best of our knowledge, iSLAM is the first SLAM system showing that the front-end and back-end can mutually correct each other in a self-supervised manner.


Non-iterative SLAM for Warehouse Robots Using Ground Textures

arXiv.org Artificial Intelligence

We present a novel visual SLAM method for the warehouse robot with a single downward-facing camera using ground textures. Traditional methods resort to feature matching or point registration for pose optimization, which easily suffers from repetitive features and poor texture quality. In this paper, we present a robust kernel cross-correlator for robust image-level registration. Compared with the existing methods that often use iterative solutions, our method, named non-iterative visual SLAM (NI-SLAM), has a closed-form solution with a complexity of $O(n\log n)$. This allows it to run very efficiently, yet still provide better accuracy and robustness than the state-of-the-art methods. In the experiments, we demonstrate that it achieves 78% improvement over the state-of-the-art systems for indoor and outdoor localization. We have successfully tested it on warehouse robots equipped with a single downward camera, showcasing its product-ready superiority in a real operating area.


Group-Aware Interest Disentangled Dual-Training for Personalized Recommendation

arXiv.org Artificial Intelligence

--Personalized recommender systems aim to predict users' preferences for items. It has become an indispensable part of online services. Online social platforms enable users to form groups based on their common interests. The users' group participation on social platforms reveals their interests and can be utilized as side information to mitigate the data sparsity and cold-start problem in recommender systems. Users join different groups out of different interests. In this paper, we generate group representation from the user's interests and propose IGRec (Interest-based Group enhanced Recommendation) to utilize the group information accurately. It consists of four modules. We conduct extensive experiments on three publicly available datasets. Results show IGRec can effectively alleviate the data sparsity problem and enhance the recommender system with interest-based group representation. Recommender systems (RS) [1]-[3] are becoming indispensable to web applications owing to their prominent ability in user retention [4] and commercial conversion [5]. Data sparsity and cold-start problems [6]-[8] are still obstacles that most RS suffer from.


NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities

arXiv.org Artificial Intelligence

We present Neural Signal Operated Intelligent Robots (NOIR), a general-purpose, intelligent brain-robot interface system that enables humans to command robots to perform everyday activities through brain signals. Through this interface, humans communicate their intended objects of interest and actions to the robots using electroencephalography (EEG). Our novel system demonstrates success in an expansive array of 20 challenging, everyday household activities, including cooking, cleaning, personal care, and entertainment. The effectiveness of the system is improved by its synergistic integration of robot learning algorithms, allowing for NOIR to adapt to individual users and predict their intentions. Our work enhances the way humans interact with robots, replacing traditional channels of interaction with direct, neural communication. Project website: https://noir-corl.github.io/.


VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to carry out the physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a vision-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Videos and code at https://voxposer.github.io


EVBattery: A Large-Scale Electric Vehicle Dataset for Battery Health and Capacity Estimation

arXiv.org Artificial Intelligence

Electric vehicles (EVs) play an important role in reducing carbon emissions. As EV adoption accelerates, safety issues caused by EV batteries have become an important research topic. In order to benchmark and develop data-driven methods for this task, we introduce a large and comprehensive dataset of EV batteries. Our dataset includes charging records collected from hundreds of EVs from three manufacturers over several years. Our dataset is the first large-scale public dataset on real-world battery data, as existing data either include only several vehicles or is collected in the lab environment. Meanwhile, our dataset features two types of labels, corresponding to two key tasks - battery health estimation and battery capacity estimation. In addition to demonstrating how existing deep learning algorithms can be applied to this task, we further develop an algorithm that exploits the data structure of battery systems. Our algorithm achieves better results and shows that a customized method can improve model performances. We hope that this public dataset provides valuable resources for researchers, policymakers, and industry professionals to better understand the dynamics of EV battery aging and support the transition toward a sustainable transportation system.


Unified Pretraining for Recommendation via Task Hypergraphs

arXiv.org Artificial Intelligence

Although pretraining has garnered significant attention and popularity in recent years, its application in graph-based recommender systems is relatively limited. It is challenging to exploit prior knowledge by pretraining in widely used ID-dependent datasets. On one hand, user-item interaction history in one dataset can hardly be transferred to other datasets through pretraining, where IDs are different. On the other hand, pretraining and finetuning on the same dataset leads to a high risk of overfitting. In this paper, we propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs. For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction. A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation. Experimental results on three benchmark datasets verify the superiority of UPRTH. Additional detailed investigations are conducted to demonstrate the effectiveness of the proposed framework.