Goto

Collaborating Authors

 Shi, Weixian


Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation

arXiv.org Artificial Intelligence

Although current Large Language Models (LLMs) exhibit impressive capabilities, performing complex real-world tasks still requires tool learning. Mainstream methods, such as CoT/ReAct, rely on step-by-step tool invocation to interact with external environments, but they are limited in perceptual scope and lack adequate task-planning capability. To address these limitations, other studies introduce the first Search-based Decision Tree (DFSDT), which still suffers from the high computational cost. In this paper, we introduce a novel parallel tool invocation paradigm, DTA-Llama (Divide-Then-Aggregate Llama). First, we transform traditional tree-based tool search paths into Directed Acyclic Graph (DAG) structure, generating a high-quality parallel tool invocation dataset. The DTA-Llama is then trained on the dataset to learn to iteratively divide the current task into several parallel tool invocation sub-tasks and aggregate the invocation results to decide the next actions. Furthermore, we introduce an efficient inference framework inspired by the Process/Threads mechanism when applying the DTA-Llama to practical tasks. Experimental results show that our approach substantially enhances task performance while reducing token consumption and inference time. Llama2-7B, using our method, is comparable to the official parallel function calling method of GPT-3.5. The relevant code, dataset, and model weights are available at https://corn0205.github.io/


Spatiotemporal Attention Enhances Lidar-Based Robot Navigation in Dynamic Environments

arXiv.org Artificial Intelligence

So inferring the scene dynamics from sensor readings without explicit object tracking is a pivotal aspect of foresighted navigation among pedestrians. In this paper, we introduce a spatiotemporal attention pipeline for enhanced navigation based on 2D lidar sensor readings. This pipeline is complemented by a novel lidar-state representation that emphasizes dynamic obstacles over static ones. Subsequently, the attention mechanism enables selective scene perception across both space and time, resulting in improved overall navigation performance within dynamic scenarios. We thoroughly evaluated the approach in different scenarios and simulators, finding good generalization to unseen environments. The results demonstrate outstanding performance compared to state-of-the-art methods, thereby enabling the seamless deployment of the learned controller on a real robot. Figure 1: Our pipeline for learning a robot navigation controller based on lidar.


Subgoal-Driven Navigation in Dynamic Environments Using Attention-Based Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Collision-free, goal-directed navigation in environments containing unknown static and dynamic obstacles is still a great challenge, especially when manual tuning of navigation policies or costly motion prediction needs to be avoided. In this paper, we therefore propose a subgoal-driven hierarchical navigation architecture that is trained with deep reinforcement learning and decouples obstacle avoidance and motor control. In particular, we separate the navigation task into the prediction of the next subgoal position for avoiding collisions while moving toward the final target position, and the prediction of the robot's velocity controls. By relying on 2D lidar, our method learns to avoid obstacles while still achieving goal-directed behavior as well as to generate low-level velocity control commands to reach the subgoals. In our architecture, we apply the attention mechanism on the robot's 2D lidar readings and compute the importance of lidar scan segments for avoiding collisions. As we show in simulated and real-world experiments with a Turtlebot robot, our proposed method leads to smooth and safe trajectories among humans and significantly outperforms a state-of-the-art approach in terms of success rate. A supplemental video describing our approach is available online.