Goto

Collaborating Authors

 perception module


Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language

Neural Information Processing Systems

In this work, we propose a unified framework, called Visual Reasoning with Differentiable Physics (VRDP) 1, that can jointly learn visual concepts and infer physics models of objects and their interactions from videos and language. This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine. The visual perception module parses each video frame into object-centric trajectories and represents them as latent scene representations. The concept learner grounds visual concepts (e.g., color, shape, and material) from these object-centric representations based on the language, thus providing prior knowledge for the physics engine. The differentiable physics model, implemented as an impulse-based differentiable rigid-body simulator, performs differentiable physical simulation based on the grounded concepts to infer physical properties, such as mass, restitution, and velocity, by fitting the simulated trajectories into the video observations. Consequently, these learned concepts and physical models can explain what we have seen and imagine what is about to happen in future and counterfactual scenarios.







Gait-Adaptive Perceptive Humanoid Locomotion with Real-Time Under-Base Terrain Reconstruction

arXiv.org Artificial Intelligence

Abstract-- For full-size humanoid robots, even with recent advances in reinforcement learning-based control, achieving reliable locomotion on complex terrains, such as long staircases, remains challenging. In such settings, limited perception, ambiguous terrain cues, and insufficient adaptation of gait timing can cause even a single misplaced or mistimed step to result in rapid loss of balance. We introduce a perceptive locomotion framework that merges terrain sensing, gait regulation, and whole-body control into a single reinforcement learning policy. A downward-facing depth camera mounted under the base observes the support region around the feet, and a compact U-Net reconstructs a dense egocentric height map from each frame in real time, operating at the same frequency as the control loop. The perceptual height map, together with proprioceptive observations, is processed by a unified policy that produces joint commands and a global stepping-phase signal, allowing gait timing and whole-body posture to be adapted jointly to the commanded motion and local terrain geometry. We further adopt a single-stage successive teacher-student training scheme for efficient policy learning and knowledge transfer . Experiments conducted on a 31-DoF, 1.65 m humanoid robot demonstrate robust locomotion in both simulation and real-world settings, including forward and backward stair ascent and descent, as well as crossing a 46 cm gap.



Your Ride, Your Rules: Psychology and Cognition Enabled Automated Driving Systems

arXiv.org Artificial Intelligence

Despite rapid advances in autonomous driving technology, current autonomous vehicles (AVs) primarily respond to external traffic conditions and treat humans as passive occupants, lacking mechanisms for active adaptation and collaboration. This limitation c onstrains their ability to personalize driving behavior to human expectations and hinders effective navigation of ambiguous traffic scenarios that could benefit from leveraging the occupant's advanced cognitive input, resulting in increased delays and pote ntial safety risks. This inadequacy in the long term undermines occupant trust and hinder s the widespread adoption of AV technologies. This research is motivated to propose PACE - ADS (Psychology and Cognition Enabled Automated Driving Systems): a human - centered autonomy framework that enables AVs to sense, interpret, and respond to both external traffic conditions and internal occupant states. PACE - ADS is built on an agentic workflow where three foundation model agents collaborate: the Driver Age nt interprets the external environment; the Psychologist Agent decodes passive psychological signals ( e.g., facial expressions) and active cognitive inputs (e.g., verbal commands); and the Coordinator Agent synthesizes these inputs to generate high - level driving behavior decisions and parameters that enhance responsiveness in ambiguous scenarios and person alize the ride. PACE - ADS is designed to complement, rather than replace, conventional AV modules. It operates at the low - frequency semantic planning layer while delegating low - level, high - frequency control to the vehicle's native systems.


ATOM-CBF: Adaptive Safe Perception-Based Control under Out-of-Distribution Measurements

arXiv.org Artificial Intelligence

Ensuring the safety of real-world systems is challenging, especially when they rely on learned perception modules to infer the system state from high-dimensional sensor data. These perception modules are vulnerable to epistemic uncertainty, often failing when encountering out-of-distribution (OoD) measurements not seen during training. To address this gap, we introduce ATOM-CBF (Adaptive-To-OoD-Measurement Control Barrier Function), a novel safe control framework that explicitly computes and adapts to the epistemic uncertainty from OoD measurements, without the need for ground-truth labels or information on distribution shifts. Our approach features two key components: (1) an OoD-aware adaptive perception error margin and (2) a safety filter that integrates this adaptive error margin, enabling the filter to adjust its conservatism in real-time. We provide empirical validation in simulations, demonstrating that ATOM-CBF maintains safety for an F1Tenth vehicle with LiDAR scans and a quadruped robot with RGB images.