Goto

Collaborating Authors

 reality gap


Grounded in Reality: Learning and Deploying Proactive LLM from Offline Logs

Wei, Fei, Chen, Daoyuan, Wang, Ce, Huang, Yilun, Chen, Yushuo, Pan, Xuchen, Li, Yaliang, Ding, Bolin

arXiv.org Artificial Intelligence

Large Language Models (LLMs) excel as passive responders, but teaching them to be proactive, goal-oriented partners--a critical capability in high-stakes domains--remains a major challenge. Current paradigms either myopically optimize single-turn attributes or rely on brittle, high-cost user simulators, creating a persistent "reality gap". To bridge this gap, we introduce Learn-to-Ask, a general, simulator-free framework for learning and deploying proactive dialogue agents directly from offline expert data, bypassing the need to model complex user dynamics. Our key insight is to reframe the offline policy learning problem by leveraging the observed future of each expert trajectory. This allows us to infer a dense, turn-by-turn reward signal grounded in the expert's revealed strategy, decomposing the intractable long-horizon problem into a series of supervised learning tasks, and training a policy to output a structured (action, state assessment) tuple, governing both what to ask and, crucially, when to stop. To ensure reward fidelity, our Automated Grader Calibration pipeline systematically purges noise from the LLM-based reward model with minimal human supervision. Empirically, we demonstrate the efficacy of Learn-to-Ask in a real-world medical dataset, using LLMs of varying sizes up to 32B. Our approach culminates in the successful deployment of LLMs into a live, large-scale online AI service. In rigorous in-house evaluations, our model was launched and achieved performance even superior to human experts, proving our framework's ability to translate offline data into tangible, real-world impact. We hope this work provides a practical and economically viable blueprint for transforming passive LLMs into proactive, goal-oriented LLM applications. Across industries such as healthcare, law, and finance, numerous goal-oriented conversations take place every day between human experts and their clients (Wang et al., 2025; Y ang et al., 2023). This vast corpus of dialogue data represents a largely untapped goldmine, containing implicit expert-driven strategies for navigating complex, information-seeking scenarios. While organizations possess these valuable data assets, Large Language Models (LLMs) are seldom trained to harness them effectively. Instead, their default behavior remains largely passive, limiting their potential as truly collaborative and proactive partners. In high-stakes domains, this passivity is a critical failure - an intelligent LLM application should not merely answer questions but proactively form a policy to gather information and drive the conversation towards a designated goal. Two main paradigms have emerged to instill such proactivity, yet both struggle with a significant "reality gap". It optimizes for local attributes and fails to learn a coherent, sequential policy that accounts for temporal dependencies in a conversation.


A Multi-Modality Evaluation of the Reality Gap in Autonomous Driving Systems

Lambertenghi, Stefano Carlo, Valdez, Mirena Flores, Stocco, Andrea

arXiv.org Artificial Intelligence

Simulation-based testing is a cornerstone of Autonomous Driving System (ADS) development, offering safe and scalable evaluation across diverse driving scenarios. However, discrepancies between simulated and real-world behavior, known as the reality gap, challenge the transferability of test results to deployed systems. In this paper, we present a comprehensive empirical study comparing four representative testing modalities: Software-in-the-Loop (SiL), Vehicle-in-the-Loop (ViL), Mixed-Reality (MR), and full real-world testing. Using a small-scale physical vehicle equipped with real sensors (camera and LiDAR) and its digital twin, we implement each setup and evaluate two ADS architectures (modular and end-to-end) across diverse indoor driving scenarios involving real obstacles, road topologies, and indoor environments. We systematically assess the impact of each testing modality along three dimensions of the reality gap: actuation, perception, and behavioral fidelity. Our results show that while SiL and ViL setups simplify critical aspects of real-world dynamics and sensing, MR testing improves perceptual realism without compromising safety or control. Importantly, we identify the conditions under which failures do not transfer across testing modalities and isolate the underlying dimensions of the gap responsible for these discrepancies. Our findings offer actionable insights into the respective strengths and limitations of each modality and outline a path toward more robust and transferable validation of autonomous driving systems.


Sim-to-Real of Humanoid Locomotion Policies via Joint Torque Space Perturbation Injection

Cha, Woohyun, Cha, Junhyeok, Shin, Jaeyong, Kim, Donghyeon, Park, Jaeheung

arXiv.org Artificial Intelligence

-- This paper proposes a novel alternative to existing sim-to-real methods for training control policies with simulated experiences. Prior sim-to-real methods for legged robots mostly rely on the domain randomization approach, where a fixed finite set of simulation parameters is randomized during training. Instead, our method adds state-dependent perturbations to the input joint torque used for forward simulation during the training phase. These state-dependent perturbations are designed to simulate a broader range of reality gaps than those captured by randomizing a fixed set of simulation parameters. Experimental results show that our method enables humanoid locomotion policies that achieve greater robustness against complex reality gaps unseen in the training domain. Deep Reinforcement Learning (DRL) for robotic applications has gained significant attention due to its demonstrated robustness and versatility. Although DRL algorithms are capable of solving complex, high-dimensional control problems, commonly used on-policy methods often require a prohibitively large amount of data, posing a substantial challenge when collecting sufficient samples solely from real hardware. Moreover, the exploration process required for policy improvement in early training stages can raise safety concerns for both the physical robot and its operational environment.


A Simulation Pipeline to Facilitate Real-World Robotic Reinforcement Learning Applications

Silveira, Jefferson, Marshall, Joshua A., Givigi, Sidney N. Jr

arXiv.org Artificial Intelligence

Reinforcement learning (RL) has gained traction for its success in solving complex tasks for robotic applications. However, its deployment on physical robots remains challenging due to safety risks and the comparatively high costs of training. To avoid these problems, RL agents are often trained on simulators, which introduces a new problem related to the gap between simulation and reality. This paper presents an RL pipeline designed to help reduce the reality gap and facilitate developing and deploying RL policies for real-world robotic systems. The pipeline organizes the RL training process into an initial step for system identification and three training stages: core simulation training, high-fidelity simulation, and real-world deployment, each adding levels of realism to reduce the sim-to-real gap. Each training stage takes an input policy, improves it, and either passes the improved policy to the next stage or loops it back for further improvement. This iterative process continues until the policy achieves the desired performance. The pipeline's effectiveness is shown through a case study with the Boston Dynamics Spot mobile robot used in a surveillance application. The case study presents the steps taken at each pipeline stage to obtain an RL agent to control the robot's position and orientation.


Optimal Gait Design for a Soft Quadruped Robot via Multi-fidelity Bayesian Optimization

Tan, Kaige, Niu, Xuezhi, Ji, Qinglei, Feng, Lei, Törngren, Martin

arXiv.org Artificial Intelligence

This study focuses on the locomotion capability improvement in a tendon-driven soft quadruped robot through an online adaptive learning approach. Leveraging the inverse kinematics model of the soft quadruped robot, we employ a central pattern generator to design a parametric gait pattern, and use Bayesian optimization (BO) to find the optimal parameters. Further, to address the challenges of modeling discrepancies, we implement a multi-fidelity BO approach, combining data from both simulation and physical experiments throughout training and optimization. This strategy enables the adaptive refinement of the gait pattern and ensures a smooth transition from simulation to real-world deployment for the controller. Moreover, we integrate a computational task off-loading architecture by edge computing, which reduces the onboard computational and memory overhead, to improve real-time control performance and facilitate an effective online learning process. The proposed approach successfully achieves optimal walking gait design for physical deployment with high efficiency, effectively addressing challenges related to the reality gap in soft robotics.


Low Fidelity Digital Twin for Automated Driving Systems: Use Cases and Automatic Generation

Vlasak, Jiri, Klapálek, Jaroslav, Kollarčík, Adam, Sojka, Michal, Hanzálek, Zdeněk

arXiv.org Artificial Intelligence

Automated driving systems are an integral part of the automotive industry. Tools such as Robot Operating System and simulators support their development. However, in the end, the developers must test their algorithms on a real vehicle. To better observe the difference between reality and simulation--the reality gap--digital twin technology offers real-time communication between the real vehicle and its model. We present low fidelity digital twin generator and describe situations where automatic generation is preferable to high fidelity simulation. We validated our approach of generating a virtual environment with a vehicle model by replaying the data recorded from the real vehicle.


Real-to-Sim Adaptation via High-Fidelity Simulation to Control a Wheeled-Humanoid Robot with Unknown Dynamics

Baek, Donghoon, Sim, Youngwoo, Purushottam, Amartya, Gupta, Saurabh, Ramos, Joao

arXiv.org Artificial Intelligence

Model-based controllers using a linearized model around the system's equilibrium point is a common approach in the control of a wheeled humanoid due to their less computational load and ease of stability analysis. However, controlling a wheeled humanoid robot while it lifts an unknown object presents significant challenges, primarily due to the lack of knowledge in object dynamics. This paper presents a framework designed for predicting the new equilibrium point explicitly to control a wheeled-legged robot with unknown dynamics. We estimated the total mass and center of mass of the system from its response to initially unknown dynamics, then calculated the new equilibrium point accordingly. To avoid using additional sensors (e.g., force torque sensor) and reduce the effort of obtaining expensive real data, a data-driven approach is utilized with a novel real-to-sim adaptation. A more accurate nonlinear dynamics model, offering a closer representation of real-world physics, is injected into a rigid-body simulation for real-to-sim adaptation. The nonlinear dynamics model parameters were optimized using Particle Swarm Optimization. The efficacy of this framework was validated on a physical wheeled inverted pendulum, a simplified model of a wheeled-legged robot. The experimental results indicate that employing a more precise analytical model with optimized parameters significantly reduces the gap between simulation and reality, thus improving the efficiency of a model-based controller in controlling a wheeled robot with unknown dynamics.


Benchmarking the Sim-to-Real Gap in Cloth Manipulation

Blanco-Mulero, David, Barbany, Oriol, Alcan, Gokhan, Colomé, Adrià, Torras, Carme, Kyrki, Ville

arXiv.org Artificial Intelligence

Realistic physics engines play a crucial role for learning to manipulate deformable objects such as garments in simulation. By doing so, researchers can circumvent challenges such as sensing the deformation of the object in the realworld. In spite of the extensive use of simulations for this task, few works have evaluated the reality gap between deformable object simulators and real-world data. We present a benchmark dataset to evaluate the sim-to-real gap in cloth manipulation. The dataset is collected by performing a dynamic as well as a quasi-static cloth manipulation task involving contact with a rigid table. We use the dataset to evaluate the reality gap, computational time, and simulation stability of four popular deformable object simulators: MuJoCo, Bullet, Flex, and SOFA. Additionally, we discuss the benefits and drawbacks of each simulator. The benchmark dataset is open-source. Supplementary material, videos, and code, can be found at https://sites.google.com/view/cloth-sim2real-benchmark.


Sim-to-Real Transfer of Compliant Bipedal Locomotion on Torque Sensor-Less Gear-Driven Humanoid

Masuda, Shimpei, Takahashi, Kuniyuki

arXiv.org Artificial Intelligence

Sim-to-real is a mainstream method to cope with the large number of trials needed by typical deep reinforcement learning methods. However, transferring a policy trained in simulation to actual hardware remains an open challenge due to the reality gap. In particular, the characteristics of actuators in legged robots have a considerable influence on sim-to-real transfer. There are two challenges: 1) High reduction ratio gears are widely used in actuators, and the reality gap issue becomes especially pronounced when backdrivability is considered in controlling joints compliantly. 2) The difficulty in achieving stable bipedal locomotion causes typical system identification methods to fail to sufficiently transfer the policy. For these two challenges, we propose 1) a new simulation model of gears and 2) a method for system identification that can utilize failed attempts. The method's effectiveness is verified using a biped robot, the ROBOTIS-OP3, and the sim-to-real transferred policy can stabilize the robot under severe disturbances and walk on uneven surfaces without using force and torque sensors.


End-to-end Reinforcement Learning for Time-Optimal Quadcopter Flight

Ferede, Robin, De Wagter, Christophe, Izzo, Dario, de Croon, Guido C. H. E.

arXiv.org Artificial Intelligence

Aggressive time-optimal control of quadcopters poses a significant challenge in the field of robotics. The state-of-the-art approach leverages reinforcement learning (RL) to train optimal neural policies. However, a critical hurdle is the sim-to-real gap, often addressed by employing a robust inner loop controller -an abstraction that, in theory, constrains the optimality of the trained controller, necessitating margins to counter potential disturbances. In contrast, our novel approach introduces high-speed quadcopter control using end-to-end RL (E2E) that gives direct motor commands. To bridge the reality gap, we incorporate a learned residual model and an adaptive method that can compensate for modeling errors in thrust and moments. We compare our E2E approach against a state-of-the-art network that commands thrust and body rates to an INDI inner loop controller, both in simulated and real-world flight. E2E showcases a significant 1.39-second advantage in simulation and a 0.17-second edge in real-world testing, highlighting end-to-end reinforcement learning's potential. The performance drop observed from simulation to reality shows potential for further improvement, including refining strategies to address the reality gap or exploring offline reinforcement learning with real flight data.