Goto

Collaborating Authors

 baseline algorithm


Evolution Gym: ALarge-Scale Benchmark for Evolving Soft Robots

Neural Information Processing Systems

However, while optimal control is well studied in the machine learning and robotics community, less attention is placed on finding the optimal robot design. This is mainly because co-optimizing design and control in robotics is characterized as a challenging problem, and more importantly, a comprehensive evaluation benchmark for co-optimization does not exist. In this paper, we propose Evolution Gym, the first large-scale benchmark for co-optimizing the design and control of soft robots. In our benchmark, each robot is composed of different types of voxels (e.g., soft, rigid, actuators), resulting in a modular and expressive robot design space. Our benchmark environments span a wide range of tasks, including locomotion on various types of terrains and manipulation.



Learning Causal Structures Using Regression Invariance

Neural Information Processing Systems

We study causal discovery in a multi-environment setting, in which the functional relations for producing the variables from their direct causes remain the same across environments, while the distribution of exogenous noises may vary. We introduce the idea of using the invariance of the functional relations of the variables to their causes across a set of environments for structure learning. We define a notion of completeness for a causal inference algorithm in this setting and prove the existence of such algorithm by proposing the baseline algorithm. Additionally, we present an alternate algorithm that has significantly improved computational and sample complexity compared to the baseline algorithm. Experiment results show that the proposed algorithm outperforms the other existing algorithms.



SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

Neural Information Processing Systems

In this paper, we present a hyperparameter-free offline safe IL algorithm, SafeDICE, that learns safe policy by leveraging the non-preferred demonstrations in the space of stationary distributions. Our algorithm directly estimates the stationary distribution corrections of the policy that imitate the demonstrations excluding the non-preferred behavior.


Supplementary Materials - Adaptive Online Replanning with Diffusion Models Siyuan Zhou

Neural Information Processing Systems

In the supplementary, we first discuss the experimental details and hyperparameters in Section A. Section B, and further present the visualization in RLBench in Section C. Finally, we discuss how to MLP with 512 hidden units and Mish activations. The probability ฯต of random actions is set to 0. 03 in Stochastic Environments. So the sampled trajectories still lead to the collision. Figure 1 illustrates a problematic sampled trajectory after execution. We further evaluate the performance with different replanning steps in Table 1.