overshoot
Author Response for ' Shaping Belief States with Generative Environment Models for RL '
We are grateful to all constructive and actionable feedback provided by the reviewers. We believe to have addressed the key concerns raised by the reviewers below. 's concerns with our main hypothesis as it has not We are working to improve our explanations in section 2.2 based on all feedback We emphasize that careful empirical experimentation in ML can also bring valuable insights to the community. Studying these factors require an intersectional empirical study such as this paper. Probabilistic models benefit more from overshoot than Deterministic models.
Hybrid LLM and Higher-Order Quantum Approximate Optimization for CSA Collateral Management
Jin, Tao, Florescu, Stuart, Heyu, null, Jin, null
We address finance-native collateral optimization under ISDA Credit Support Annexes (CSAs), where integer lots, Schedule A haircuts, RA/MTA gating, and issuer/currency/class caps create rugged, legally bounded search spaces. We introduce a certifiable hybrid pipeline purpose-built for this domain: (i) an evidence-gated LLM that extracts CSA terms to a normalized JSON (abstain-by-default, span-cited); (ii) a quantum-inspired explorer that interleaves simulated annealing with micro higher order QAOA (HO-QAOA) on binding sub-QUBOs (subset size n <= 16, order k <= 4) to coordinate multi-asset moves across caps and RA-induced discreteness; (iii) a weighted risk-aware objective (Movement, CVaR, funding-priced overshoot) with an explicit coverage window U <= Reff+B; and (iv) CP-SAT as single arbiter to certify feasibility and gaps, including a U-cap pre-check that reports the minimal feasible buffer B*. Encoding caps/rounding as higher-order terms lets HO-QAOA target the domain couplings that defeat local swaps. On government bond datasets and multi-CSA inputs, the hybrid improves a strong classical baseline (BL-3) by 9.1%, 9.6%, and 10.7% across representative harnesses, delivering better cost-movement-tail frontiers under governance settings. We release governance grade artifacts-span citations, valuation matrix audit, weight provenance, QUBO manifests, and CP-SAT traces-to make results auditable and reproducible.
Activation Steering with a Feedback Controller
Nguyen, Dung V., Vu, Hieu M., Pham, Nhi Y., Zhang, Lei, Nguyen, Tan M.
Controlling the behaviors of large language models (LLM) is fundamental to their safety alignment and reliable deployment. However, existing steering methods are primarily driven by empirical insights and lack theoretical performance guarantees. In this work, we develop a control-theoretic foundation for activation steering by showing that popular steering methods correspond to the proportional (P) controllers, with the steering vector serving as the feedback signal. Building on this finding, we propose Proportional-Integral-Derivative (PID) Steering, a principled framework that leverages the full PID controller for activation steering in LLMs. The proportional (P) term aligns activations with target semantic directions, the integral (I) term accumulates errors to enforce persistent corrections across layers, and the derivative (D) term mitigates overshoot by counteracting rapid activation changes. This closed-loop design yields interpretable error dynamics and connects activation steering to classical stability guarantees in control theory. Moreover, PID Steering is lightweight, modular, and readily integrates with state-of-the-art steering methods. Extensive experiments across multiple LLM families and benchmarks demonstrate that PID Steering consistently outperforms existing approaches, achieving more robust and reliable behavioral control.
Overshoot: Taking advantage of future gradients in momentum-based stochastic optimization
Kopal, Jakub, Gregor, Michal, de Leon-Martinez, Santiago, Simko, Jakub
Overshoot is a novel, momentum-based stochastic gradient descent optimization method designed to enhance performance beyond standard and Nesterov's momentum. In conventional momentum methods, gradients from previous steps are aggregated with the gradient at current model weights before taking a step and updating the model. Rather than calculating gradient at the current model weights, Overshoot calculates the gradient at model weights shifted in the direction of the current momentum. This sacrifices the immediate benefit of using the gradient w.r.t. the exact model weights now, in favor of evaluating at a point, which will likely be more relevant for future updates. We show that incorporating this principle into momentum-based optimizers (SGD with momentum and Adam) results in faster convergence (saving on average at least 15% of steps). Overshoot consistently outperforms both standard and Nesterov's momentum across a wide range of tasks and integrates into popular momentum-based optimizers with zero memory and small computational overhead.
Developing Simulation Models for Soft Robotic Grippers in Webots
Hadi, Yulyan Wahyu, Hof, Lars, Jayawardhana, Bayu, Haghighat, Bahar
Robotic simulators provide cost-effective and risk-free virtual environments for studying robotic designs, control algorithms, and sensor integrations. They typically host extensive libraries of sensors and actuators that facilitate rapid prototyping and design evaluations in simulation. The use of the most prominent existing robotic simulators is however limited to simulation of rigid-link robots. On the other hand, there exist dedicated specialized environments for simulating soft robots. This separation limits the study of soft robotic systems, particularly in hybrid scenarios where soft and rigid sub-systems co-exist. In this work, we develop a lightweight open-source digital twin of a commercially available soft gripper, directly integrated within the robotic simulator Webots. We use a Rigid-Link-Discretization (RLD) model to simulate the soft gripper. Using a Particle Swarm Optimization (PSO) approach, we identify the parameters of the RLD model based on the kinematics and dynamics of the physical system and show the efficacy of our modeling approach in validation experiments. All software and experimental details are available on github: https://github.com/anonymousgituser1/Robosoft2025
Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System
Schรคfer, Georg, Rehrl, Jakob, Huber, Stefan, Hirlaender, Simon
This study conducts a comparative analysis of Model Predictive Control (MPC) and Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) algorithm, applied to a 1-Degree of Freedom (DOF) Quanser Aero 2 system. Classical control techniques such as MPC and Linear Quadratic Regulator (LQR) are widely used due to their theoretical foundation and practical effectiveness. However, with advancements in computational techniques and machine learning, DRL approaches like PPO have gained traction in solving optimal control problems through environment interaction. This paper systematically evaluates the dynamic response characteristics of PPO and MPC, comparing their performance, computational resource consumption, and implementation complexity. Experimental results show that while LQR achieves the best steady-state accuracy, PPO excels in rise-time and adaptability, making it a promising approach for applications requiring rapid response and adaptability. Additionally, we have established a baseline for future RL-related research on this specific testbed. We also discuss the strengths and limitations of each control strategy, providing recommendations for selecting appropriate controllers for real-world scenarios.
Teleoperation of a robotic manipulator in peri-personal space: a virtual wand approach
Poignant, Alexis, Morel, Guillaume, Jarrassรฉ, Nathanaรซl
The paper deals with the well-known problem of teleoperating a robotic arm along six degrees of freedom. The prevailing and most effective approach to this problem involves a direct position-to-position mapping, imposing robotic end-effector movements that mirrors those of the user. In the particular case where the robot stands near the operator, there are alternatives to this approach. Drawing inspiration from head pointers utilized in the 1980s, originally designed to enable drawing with limited head motions for tetraplegic individuals, we propose a "virtual wand" mapping. It employs a virtual rigid linkage between the hand and the robot's end-effector. With this approach, rotations produce amplified translations through a lever arm, creating a "rotation-to-position" coupling. This approach expands the translation workspace at the expense of a reduced rotation space. We compare the virtual wand approach to the one-to-one position mapping through the realization of 6-DoF reaching tasks. Results indicate that the two different mappings perform comparably well, are equally well-received by users, and exhibit similar motor control behaviors. Nevertheless, the virtual wand mapping is anticipated to outperform in tasks characterized by large translations and minimal effector rotations, whereas direct mapping is expected to demonstrate advantages in large rotations with minimal translations. These results pave the way for new interactions and interfaces, particularly in disability assistance utilizing head movements (instead of hands). Leveraging body parts with substantial rotations could enable the accomplishment of tasks previously deemed infeasible with standard direct coupling interfaces.
On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization
Sohrabi, Motahareh, Ramirez, Juan, Zhang, Tianyue H., Lacoste-Julien, Simon, Gallego-Posada, Jose
Constrained optimization offers a powerful framework to prescribe desired behaviors in neural network models. Typically, constrained problems are solved via their min-max Lagrangian formulations, which exhibit unstable oscillatory dynamics when optimized using gradient descent-ascent. The adoption of constrained optimization techniques in the machine learning community is currently limited by the lack of reliable, general-purpose update schemes for the Lagrange multipliers. This paper proposes the $\nu$PI algorithm and contributes an optimization perspective on Lagrange multiplier updates based on PI controllers, extending the work of Stooke, Achiam and Abbeel (2020). We provide theoretical and empirical insights explaining the inability of momentum methods to address the shortcomings of gradient descent-ascent, and contrast this with the empirical success of our proposed $\nu$PI controller. Moreover, we prove that $\nu$PI generalizes popular momentum methods for single-objective minimization. Our experiments demonstrate that $\nu$PI reliably stabilizes the multiplier dynamics and its hyperparameters enjoy robust and predictable behavior.
Haptic-Based Bilateral Teleoperation of Aerial Manipulator for Extracting Wedged Object with Compensation of Human Reaction Time
Byun, Jeonghyun, Eom, Dohyun, Kim, H. Jin
Bilateral teleoperation of an aerial manipulator facilitates the execution of industrial missions thanks to the combination of the aerial platform's maneuverability and the ability to conduct complex tasks with human supervision. Heretofore, research on such operations has focused on flying without any physical interaction or exerting a pushing force on a contact surface that does not involve abrupt changes in the interaction force. In this paper, we propose a human reaction time compensating haptic-based bilateral teleoperation strategy for an aerial manipulator extracting a wedged object from a static structure (i.e., plug-pulling), which incurs an abrupt decrease in the interaction force and causes additional difficulty for an aerial platform. A haptic device composed of a 4-degree-of-freedom robotic arm and a gripper is made for the teleoperation of aerial wedged object-extracting tasks, and a haptic-based teleoperation method to execute the aerial manipulator by the haptic device is introduced. We detect the extraction of the object by the estimation of the external force exerted on the aerial manipulator and generate reference trajectories for both the aerial manipulator and the haptic device after the extraction. As an example of the extraction of a wedged object, we conduct comparative plug-pulling experiments with a quadrotor-based aerial manipulator. The results validate that the proposed bilateral teleoperation method reduces the overshoot in the aerial manipulator's position and ensures fast recovery to its initial position after extracting the wedged object.