Robots
Provably Safe Reinforcement Learning with Step-wise Violation Constraints Institute for Interdisciplinary Information Sciences, Tsinghua University
We investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we focus on stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications that need to ensure safety in all decision steps but may not always possess safe actions, e.g., robot control and autonomous driving.
Provably Safe Reinforcement Learning with Step-wise Violation Constraints Institute for Interdisciplinary Information Sciences, Tsinghua University
We investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we focus on stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications that need to ensure safety in all decision steps but may not always possess safe actions, e.g., robot control and autonomous driving.
TALoS: Enhancing Semantic Scene Completion via Test-time Adaptation on the Line of Sight
Semantic Scene Completion (SSC) aims to perform geometric completion and semantic segmentation simultaneously. Despite the promising results achieved by existing studies, the inherently ill-posed nature of the task presents significant challenges in diverse driving scenarios. This paper introduces TALoS, a novel test-time adaptation approach for SSC that excavates the information available in driving environments. Specifically, we focus on that observations made at a certain moment can serve as Ground Truth (GT) for scene completion at another moment. Given the characteristics of the LiDAR sensor, an observation of an object at a certain location confirms both 1) the occupation of that location and 2) the absence of obstacles along the line of sight from the LiDAR to that point.
00989c20ff1386dc386d8124ebcba1a5-AuthorFeedback.pdf
We thank all the reviewers for their helpful feedback and positive view of our work. We believe that these additions address all of the main reviewer concerns. The actions are "turn left," "turn right," Hausman et al. is discussed at line 95 of the The plain TE only uses the imitation learning loss. The Duan et al. architecture fails in In our results we use behavioral cloning, and we plan to try IRL methods such as GAIL in future work. The Duan et al. architecture performs well in this
OmniJARVIS Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
This paper presents OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimodal interaction data.
Hugging Faces new humanoid robot HopeJr may only cost 3,000
If you want a robot assistant to live in your home and act vaguely like a human, you might be in luck. Hugging Face, a company that largely specializes in machine learning but has branched out into robotics recently, has a new humanoid robot called HopeJr coming out potentially by the end of 2025. As you can see in a video posted to X, it has a pretty wide range of movement capabilities. Per TechCrunch, it is specifically capable of 66 independent movements. The caption on the video claims it is capable of walking and "manipulating many objects," though we don't get to see the bot walk in the video.
CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations
We thank the reviewers for their comments. However, Section 2.4 of the supplement demonstrates that a single model trained on all three shape categories still gives How does CaSPR generalize to unseen categories? Yet, this is a formidable open problem in computer vision and ML beyond the scope of our work. CaSPR, we focus on many other problems of importance by leveraging a category-level prior on object shape. Prior spatiotemporal (Occupancy Flow) and point cloud reconstruction (PointFlow) methods lack this step.
Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs Allen Nie
We study a class of optimization problems motivated by automating the design and update of AI systems like coding assistants, robots, and copilots. AutoDiff frameworks, like PyTorch, enable efficient end-to-end optimization of differentiable systems. However, general computational workflows can be non-differentiable and involve rich feedback (e.g.
Risk-Sensitive Control as Inference with Rényi Divergence
This paper introduces the risk-sensitive control as inference (RCaI) that extends CaI by using Rényi divergence variational inference. RCaI is shown to be equivalent to log-probability regularized risk-sensitive control, which is an extension of the maximum entropy (MaxEnt) control. We also prove that the risk-sensitive optimal policy can be obtained by solving a soft Bellman equation, which reveals several equivalences between RCaI, MaxEnt control, the optimal posterior for CaI, and linearly-solvable control. Moreover, based on RCaI, we derive the risk-sensitive reinforcement learning (RL) methods: the policy gradient and the soft actor-critic. As the risk-sensitivity parameter vanishes, we recover the risk-neutral CaI and RL, which means that RCaI is a unifying framework. Furthermore, we give another risksensitive generalization of the MaxEnt control using Rényi entropy regularization. We show that in both of our extensions, the optimal policies have the same structure even though the derivations are very different.