Goto

Collaborating Authors

 quadruped




Perspective-Invariant 3D Object Detection

Liang, Ao, Kong, Lingdong, Lu, Dongyue, Liu, Youquan, Fang, Jian, Zhao, Huaici, Ooi, Wei Tsang

arXiv.org Artificial Intelligence

With the rise of robotics, LiDAR-based 3D object detection has garnered significant attention in both academia and industry. However, existing datasets and methods predominantly focus on vehicle-mounted platforms, leaving other autonomous platforms underexplored. To bridge this gap, we introduce Pi3DET, the first benchmark featuring LiDAR data and 3D bounding box annotations collected from multiple platforms: vehicle, quadruped, and drone, thereby facilitating research in 3D object detection for non-vehicle platforms as well as cross-platform 3D detection. Based on Pi3DET, we propose a novel cross-platform adaptation framework that transfers knowledge from the well-studied vehicle platform to other platforms. This framework achieves perspective-invariant 3D detection through robust alignment at both geometric and feature levels. Additionally, we establish a benchmark to evaluate the resilience and robustness of current 3D detectors in cross-platform scenarios, providing valuable insights for developing adaptive 3D perception systems. Extensive experiments validate the effectiveness of our approach on challenging cross-platform tasks, demonstrating substantial gains over existing adaptation methods. We hope this work paves the way for generalizable and unified 3D perception systems across diverse and complex environments. Our Pi3DET dataset, cross-platform benchmark suite, and annotation toolkit have been made publicly available.


A Additional Implementation Details

Neural Information Processing Systems

These hyperparameters are fixed throughout all domains. Tab. 1 details the hyper-parameters used in MOSS which are taken directly from We include the environment renders in Figure?? . 1 Table 2: Hyperparameters for MOSS and DQN. These hyperparameters are fixed throughout all domains. Action repeat 1 Frame repeat 12 Seed frames 4000 n-step returns 3 Mini-batch size 1048 Discount ( γ) 0.99 Optimizer Adam Learning rate 0.0001 Agent update frequency 2 Critic target EMA rate ( τ We made modifications to MOSS to evaluate in discrete action settings. Tab. 2 details the hyper-parameters used for Double DQN and MOSS in the ViZDoom environment.


A Reward Net Algorithm

Neural Information Processing Systems

In this section, we present the detailed procedures of MRN in Algorithm 1. In Section 4.2, the implicit derivative at iteration k of is calculated by: g Cauchy-Schwarz inequality, and the last inequality holds for the definition of Lipschitz smoothness. Lemma 2. Assume the outer loss Then the gradient of with respect to the outer loss is Lipschitz continuous. Theorem 1. Assume the outer loss Theorem 2. Assume the outer loss Even worse, it might be difficult for human experts to give preferences to trajectory pairs (e.g., a pair of poor trajectories.). This problem leads to a significant impact on the efficiency of the feedback in the initial stage.


Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning

Wang, Changhong, Yu, Xudong, Bai, Chenjia, Zhang, Qiaosheng, Wang, Zhen

arXiv.org Artificial Intelligence

In Reinforcement Learning (RL), training a policy from scratch with online experiences can be inefficient because of the difficulties in exploration. Recently, offline RL provides a promising solution by giving an initialized offline policy, which can be refined through online interactions. However, existing approaches primarily perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation. In real-world applications, it is common that we only have an offline dataset from a specific task while aiming for fast online-adaptation for several tasks. To address this problem, our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning. We demonstrate that the conventional paradigm using successor features cannot effectively utilize offline data and improve the performance for the new task by online fine-tuning. To mitigate this, we introduce a novel methodology that leverages offline data to acquire an ensemble of successor representations and subsequently constructs ensemble Q functions. This approach enables robust representation learning from datasets with different coverage and facilitates fast adaption of Q functions towards new tasks during the online fine-tuning phase. Extensive empirical evaluations provide compelling evidence showcasing the superior performance of our method in generalizing to diverse or even unseen tasks.


Lessons Learned in Quadruped Deployment in Livestock Farming

Rodríguez-Lera, Francisco J., González-Santamarta, Miguel A., Orden, Jose Manuel Gonzalo, Fernández-Llamas, Camino, Matellán-Olivera, Vicente, Sánchez-González, Lidia

arXiv.org Artificial Intelligence

The livestock industry faces several challenges, including labor-intensive management, the threat of predators and environmental sustainability concerns. Therefore, this paper explores the integration of quadruped robots in extensive livestock farming as a novel application of field robotics. The SELF-AIR project, an acronym for Supporting Extensive Livestock Farming with the use of Autonomous Intelligent Robots, exemplifies this innovative approach. Through advanced sensors, artificial intelligence, and autonomous navigation systems, these robots exhibit remarkable capabilities in navigating diverse terrains, monitoring large herds, and aiding in various farming tasks. This work provides insight into the SELF-AIR project, presenting the lessons learned.


CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning

Sun, Chenyu, Qian, Hangwei, Miao, Chunyan

arXiv.org Machine Learning

Offline reinforcement learning (RL) aims to learn an effective policy from a pre-collected dataset. Most existing works are to develop sophisticated learning algorithms, with less emphasis on improving the data collection process. Moreover, it is even challenging to extend the single-task setting and collect a task-agnostic dataset that allows an agent to perform multiple downstream tasks. In this paper, we propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection and ultimately improve learning efficiency and capabilities for multi-task offline RL. To achieve this, CUDC estimates the probability of the k-step future states being reachable from the current states, and adapts how many steps into the future that the dynamics model should predict. With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity. Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.


Automated Gait Generation For Walking, Soft Robotic Quadrupeds

Ketchum, Jake, Schiffer, Sophia, Sun, Muchen, Kaarthik, Pranav, Truby, Ryan L., Murphey, Todd D.

arXiv.org Artificial Intelligence

Gait generation for soft robots is challenging due to the nonlinear dynamics and high dimensional input spaces of soft actuators. Limitations in soft robotic control and perception force researchers to hand-craft open loop controllers for gait sequences, which is a non-trivial process. Moreover, short soft actuator lifespans and natural variations in actuator behavior limit machine learning techniques to settings that can be learned on the same time scales as robot deployment. Lastly, simulation is not always possible, due to heterogeneity and nonlinearity in soft robotic materials and their dynamics change due to wear. We present a sample-efficient, simulation free, method for self-generating soft robot gaits, using very minimal computation. This technique is demonstrated on a motorized soft robotic quadruped that walks using four legs constructed from 16 "handed shearing auxetic" (HSA) actuators. To manage the dimension of the search space, gaits are composed of two sequential sets of leg motions selected from 7 possible primitives. Pairs of primitives are executed on one leg at a time; we then select the best-performing pair to execute while moving on to subsequent legs. This method -- which uses no simulation, sophisticated computation, or user input -- consistently generates good translation and rotation gaits in as low as 4 minutes of hardware experimentation, outperforming hand-crafted gaits. This is the first demonstration of completely autonomous gait generation in a soft robot.


Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

Zhang, Wanpeng, Li, Yilin, Yang, Boyu, Lu, Zongqing

arXiv.org Artificial Intelligence

In real-world scenarios, the application of reinforcement learning is significantly challenged by complex non-stationarity. Most existing methods attempt to model changes in the environment explicitly, often requiring impractical prior knowledge. In this paper, we propose a new perspective, positing that non-stationarity can propagate and accumulate through complex causal relationships during state transitions, thereby compounding its sophistication and affecting policy learning. We believe that this challenge can be more effectively addressed by tracing the causal origin of non-stationarity. To this end, we introduce the Causal-Origin REPresentation (COREP) algorithm. COREP primarily employs a guided updating mechanism to learn a stable graph representation for states termed as causal-origin representation. By leveraging this representation, the learned policy exhibits impressive resilience to non-stationarity. We supplement our approach with a theoretical analysis grounded in the causal interpretation for non-stationary reinforcement learning, advocating for the validity of the causal-origin representation. Experimental results further demonstrate the superior performance of COREP over existing methods in tackling non-stationarity.