Goto

Collaborating Authors

 simulation framework



AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Neural Information Processing Systems

Large language models (LLMs) such as ChatGPT have seen widespread adoption due to their ability to follow user instructions well.Developing these LLMs involves a complex yet poorly understood workflow requiring training with human feedback. Replicating and understanding this instruction-following process faces three major challenges: the high cost of data collection, the lack of trustworthy evaluation, and the absence of reference method implementations. We address these bottlenecks with AlpacaFarm, a simulator that enables research and development for learning from feedback at a low cost. First, we design LLM based simulator for human feedback that is 45x cheaper than crowdworkers and displays high agreement with humans. Second, we identify an evaluation dataset representative of real-world instructions and propose an automatic evaluation procedure. Third, we contribute reference implementations for several methods (PPO, best-of-n, expert iteration, among others) that learn from pairwise feedback. Finally, as an end-to-end validation of AlpacaFarm, we train and evaluate eleven models on 10k pairs of human feedback and show that rankings of models trained in AlpacaFarm match rankings of models trained on human data. As a demonstration of the research possible in AlpacaFarm, we find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a +10% win-rate improvement against Davinci003.


Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework

Park, Yu Min, Tun, Yan Kyaw, Huh, Eui-Nam, Saad, Walid, Hong, Choong Seon

arXiv.org Artificial Intelligence

Beamforming is a key technology in millimeter-wave (mmWave) communications that improves signal transmission by optimizing directionality and intensity. However, conventional channel estimation methods, such as pilot signals or beam sweeping, often fail to adapt to rapidly changing communication environments. To address this limitation, multimodal sensing-aided beam prediction has gained significant attention, using various sensing data from devices such as LiDAR, radar, GPS, and RGB images to predict user locations or network conditions. Despite its promising potential, the adoption of multimodal sensing-aided beam prediction is hindered by high computational complexity, high costs, and limited datasets. Thus, in this paper, a novel resource-efficient learning framework is introduced for beam prediction, which leverages a custom-designed cross-modal relational knowledge distillation (CRKD) algorithm specifically tailored for beam prediction tasks, to transfer knowledge from a multimodal network to a radar-only student model, achieving high accuracy with reduced computational cost. To enable multimodal learning with realistic data, a novel multimodal simulation framework is developed while integrating sensor data generated from the autonomous driving simulator CARLA with MATLAB-based mmWave channel modeling, and reflecting real-world conditions. The proposed CRKD achieves its objective by distilling relational information across different feature spaces, which enhances beam prediction performance without relying on expensive sensor data. Simulation results demonstrate that CRKD efficiently distills multimodal knowledge, allowing a radar-only model to achieve $94.62%$ of the teacher performance. In particular, this is achieved with just $10%$ of the teacher network's parameters, thereby significantly reducing computational complexity and dependence on multimodal sensor data.


DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

Yu, Fengze, Li, Leshu, McDanel, Brad, Zhang, Sai Qian

arXiv.org Artificial Intelligence

Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments. Existing speculative decoding (SD) techniques accelerate token generation but remain confined to single-node execution. We propose DSD, a distributed speculative decoding framework that extends SD to multi-device deployments through coordinated draft-target execution. Given the lack of prior work on simulating this paradigm, we first introduce DSD-Sim, a discrete-event simulator that captures network, batching, and scheduling dynamics. Building on insights from DSD-Sim, we further design an Adaptive Window Control (AWC) policy that dynamically adjusts speculation window size to optimize throughput. Experiments across diverse workloads show that DSD achieves up to 1.1x speedup and 9.7% higher throughput over existing SD baselines, enabling agile and scalable LLM serving across edge and cloud.


DecNefLab: A Modular and Interpretable Simulation Framework for Decoded Neurofeedback

Olza, Alexander, Santana, Roberto, Soto, David

arXiv.org Artificial Intelligence

Decoded Neurofeedback (DecNef) is a flourishing non-invasive approach to brain modulation with wide-ranging applications in neuromedicine and cognitive neuroscience. However, progress in DecNef research remains constrained by subject-dependent learning variability, reliance on indirect measures to quantify progress, and the high cost and time demands of experimentation. We present DecNefLab, a modular and interpretable simulation framework that formalizes DecNef as a machine learning problem. Beyond providing a virtual laboratory, DecNefLab enables researchers to model, analyze and understand neurofeedback dynamics. Using latent variable generative models as simulated participants, DecNefLab allows direct observation of internal cognitive states and systematic evaluation of how different protocol designs and subject characteristics influence learning. We demonstrate how this approach can (i) reproduce empirical phenomena of DecNef learning, (ii) identify conditions under which DecNef feedback fails to induce learning, and (iii) guide the design of more robust and reliable DecNef protocols in silico before human implementation. In summary, DecNefLab bridges computational modeling and cognitive neuroscience, offering a principled foundation for methodological innovation, robust protocol design, and ultimately, a deeper understanding of DecNef-based brain modulation.


From Fold to Function: Dynamic Modeling and Simulation-Driven Design of Origami Mechanisms

Han, Tianhui, Singh, Shashwat, Patil, Sarvesh, Temel, Zeynep

arXiv.org Artificial Intelligence

Origami-inspired mechanisms can transform flat sheets into functional three-dimensional dynamic structures that are lightweight, compact, and capable of complex motion. These properties make origami increasingly valuable in robotic and deployable systems. However, accurately simulating their folding behavior and interactions with the environment remains challenging. To address this, we present a design framework for origami mechanism simulation that utilizes MuJoCo's deformable-body capabilities. In our approach, origami sheets are represented as graphs of interconnected deformable elements with user-specified constraints such as creases and actuation, defined through an intuitive graphical user interface (GUI). This framework allows users to generate physically consistent simulations that capture both the geometric structure of origami mechanisms and their interactions with external objects and surfaces. We demonstrate our method's utility through a case study on an origami catapult, where design parameters are optimized in simulation using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and validated experimentally on physical prototypes. The optimized structure achieves improved throwing performance, illustrating how our system enables rapid, simulation-driven origami design, optimization, and analysis.



ROOM: A Physics-Based Continuum Robot Simulator for Photorealistic Medical Datasets Generation

Esposito, Salvatore, Mattamala, Matías, Rebain, Daniel, Zhang, Francis Xiatian, Dhaliwal, Kevin, Khadem, Mohsen, Ramamoorthy, Subramanian

arXiv.org Artificial Intelligence

Continuum robots are advancing bronchoscopy procedures by accessing complex lung airways and enabling targeted interventions. However, their development is limited by the lack of realistic training and test environments: Real data is difficult to collect due to ethical constraints and patient safety concerns, and developing autonomy algorithms requires realistic imaging and physical feedback. We present ROOM (Realistic Optical Observation in Medicine), a comprehensive simulation framework designed for generating photorealistic bronchoscopy training data. By leveraging patient CT scans, our pipeline renders multi-modal sensor data including RGB images with realistic noise and light specularities, metric depth maps, surface normals, optical flow and point clouds at medically relevant scales. We validate the data generated by ROOM in two canonical tasks for medical robotics -- multi-view pose estimation and monocular depth estimation, demonstrating diverse challenges that state-of-the-art methods must overcome to transfer to these medical settings. Furthermore, we show that the data produced by ROOM can be used to fine-tune existing depth estimation models to overcome these challenges, also enabling other downstream applications such as navigation. We expect that ROOM will enable large-scale data generation across diverse patient anatomies and procedural scenarios that are challenging to capture in clinical settings. Code and data: https://github.com/iamsalvatore/room.


DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning

Zhang, Xinhong, Wang, Runqing, Ren, Yunfan, Sun, Jian, Fang, Hao, Chen, Jie, Wang, Gang

arXiv.org Artificial Intelligence

Abstract-- This letter introduces DiffAero, a lightweight, GPU-accelerated, and fully differentiable simulation framework designed for efficient quadrotor control policy learning. Dif-fAero supports both environment-level and agent-level parallelism and integrates multiple dynamics models, customizable sensor stacks (IMU, depth camera, and LiDAR), and diverse flight tasks within a unified, GPU-native training interface. By fully parallelizing both physics and rendering on the GPU, DiffAero eliminates CPU-GPU data transfer bottlenecks and delivers orders-of-magnitude improvements in simulation throughput. In contrast to existing simulators, DiffAero not only provides high-performance simulation but also serves as a research platform for exploring differentiable and hybrid learning algorithms. Extensive benchmarks and real-world flight experiments demonstrate that DiffAero and hybrid learning algorithms combined can learn robust flight policies in hours on consumer-grade hardware. Quadrotors--and swarms of quadrotors thereof--are increasingly deployed in complex environments for aerial inspection, environmental monitoring, and high-speed racing, owing to their agile maneuverability and onboard sensing capabilities. End-to-end learning addresses these limitations by training neural flight policies that map raw sensor observations directly to control commands, thereby streamlining the autonomy stack and enabling tighter feedback loops [4].


Integrated Simulation Framework for Adversarial Attacks on Autonomous Vehicles

Anagnostopoulos, Christos, Kapsali, Ioulia, Gkillas, Alexandros, Piperigkos, Nikos, Lalos, Aris S.

arXiv.org Artificial Intelligence

Autonomous vehicles (AVs) rely on complex perception and communication systems, making them vulnerable to adversarial attacks that can compromise safety. While simulation offers a scalable and safe environment for robustness testing, existing frameworks typically lack comprehensive supportfor modeling multi-domain adversarial scenarios. This paper introduces a novel, open-source integrated simulation framework designed to generate adversarial attacks targeting both perception and communication layers of AVs. The framework provides high-fidelity modeling of physical environments, traffic dynamics, and V2X networking, orchestrating these components through a unified core that synchronizes multiple simulators based on a single configuration file. Our implementation supports diverse perception-level attacks on LiDAR sensor data, along with communication-level threats such as V2X message manipulation and GPS spoofing. Furthermore, ROS 2 integration ensures seamless compatibility with third-party AV software stacks. We demonstrate the framework's effectiveness by evaluating the impact of generated adversarial scenarios on a state-of-the-art 3D object detector, revealing significant performance degradation under realistic conditions.