Goto

Collaborating Authors

 stability constraint




Stability and Accuracy Trade-offs in Statistical Estimation

Chakraborty, Abhinav, Luo, Yuetian, Barber, Rina Foygel

arXiv.org Machine Learning

Algorithmic stability is a central concept in statistics and learning theory that measures how sensitive an algorithm's output is to small changes in the training data. Stability plays a crucial role in understanding generalization, robustness, and replicability, and a variety of stability notions have been proposed in different learning settings. However, while stability entails desirable properties, it is typically not sufficient on its own for statistical learning -- and indeed, it may be at odds with accuracy, since an algorithm that always outputs a constant function is perfectly stable but statistically meaningless. Thus, it is essential to understand the potential statistical cost of stability. In this work, we address this question by adopting a statistical decision-theoretic perspective, treating stability as a constraint in estimation. Focusing on two representative notions-worst-case stability and average-case stability-we first establish general lower bounds on the achievable estimation accuracy under each type of stability constraint. We then develop optimal stable estimators for four canonical estimation problems, including several mean estimation and regression settings. Together, these results characterize the optimal trade-offs between stability and accuracy across these tasks. Our findings formalize the intuition that average-case stability imposes a qualitatively weaker restriction than worst-case stability, and they further reveal that the gap between these two can vary substantially across different estimation problems.


A Novel Multi-Timescale Stability-Preserving Hierarchical Reinforcement Learning Controller Framework for Adaptive Control in High-Dimensional Dynamical Systems

Khaniki, Mohammad Ali Labbaf, Taroodi, Fateme, Safizadeh, Benyamin

arXiv.org Artificial Intelligence

Controlling high-dimensional stochastic systems, critical in robotics, autonomous vehicles, and hyperchaotic systems, faces the curse of dimensionality, lacks temporal abstraction, and often fails to ensure stochastic stability. To overcome these limitations, this study introduces the Multi-Timescale Lyapunov-Constrained Hierarchical Reinforcement Learning (MTLHRL) framework. MTLHRL integrates a hierarchical policy within a semi-Markov Decision Process (SMDP), featuring a high-level policy for strategic planning and a low-level policy for reactive control, which effectively manages complex, multi-timescale decision-making and reduces dimensionality overhead. Stability is rigorously enforced using a neural Lyapunov function optimized via Lagrangian relaxation and multi-timescale actor-critic updates, ensuring mean-square boundedness or asymptotic stability in the face of stochastic dynamics. The framework promotes efficient and reliable learning through trust-region constraints and decoupled optimization. Extensive simulations on an 8D hyperchaotic system and a 5-DOF robotic manipulator demonstrate MTLHRL's empirical superiority. It significantly outperforms baseline methods in both stability and performance, recording the lowest error indices (e.g., Integral Absolute Error (IAE): 3.912 in hyperchaotic control and IAE: 1.623 in robotics), achieving faster convergence, and exhibiting superior disturbance rejection. MTLHRL offers a theoretically grounded and practically viable solution for robust control of complex stochastic systems.


One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

Gao, Lei, Huang, Shihong, Wang, Shengjie, Ma, Hong, Zhang, Feng, Bao, Hengda, Chen, Qichang, Zhou, Weihua

arXiv.org Artificial Intelligence

The three-dimensional bin packing problem (3D-BPP) is widely applied in logistics and warehousing. Existing learning-based approaches often neglect practical stability-related constraints and exhibit limitations in generalizing across diverse bin dimensions. To address these limitations, we propose a novel deep reinforcement learning framework, One4Many-StablePacker (O4M-SP). The primary advantage of O4M-SP is its ability to handle various bin dimensions in a single training process while incorporating support and weight constraints common in practice. Our training method introduces two innovative mechanisms. First, it employs a weighted reward function that integrates loading rate and a new height difference metric for packing layouts, promoting improved bin utilization through flatter packing configurations. Second, it combines clipped policy gradient optimization with a tailored policy drifting method to mitigate policy entropy collapse, encouraging exploration at critical decision nodes during packing to avoid suboptimal solutions. Extensive experiments demonstrate that O4M-SP generalizes successfully across diverse bin dimensions and significantly outperforms baseline methods. Furthermore, O4M-SP exhibits strong practical applicability by effectively addressing packing scenarios with stability constraints.



the one most critical of the paper, felt that the fundamental method we introduce here (that of constructing explicitly

Neural Information Processing Systems

We also appreciate that the reviewers, especially Reviewer 2, had some concerns with aspects of the papers as well. To each of these points, we'd like to make the following comments: We're happy to include the traditional (e.g., Tanh or LSTM) RNN for comparison, and will add this to the We didn't include Embed2Control-style comparisons, We can include a discussion and illustration of this in the revision. The "simple" model always refers to a simple feedforward network (with the same structure as We'll fully describe the video texture setup in the text (e.g., the source videos are actual videos of physical fire from Thanks for pointing out the confusion here, we'll clarify all of these. However, we'll certainly discuss this point more. We'll include all these details for the experiments (lack of space to desribe them all here).


LAPSO: A Unified Optimization View for Learning-Augmented Power System Operations

Xu, Wangkun, Chu, Zhongda, Teng, Fei

arXiv.org Artificial Intelligence

--With the high penetration of renewables, traditional model-based power system operation is challenged to deliver economic, stable, and robust decisions. Machine learning has emerged as a powerful modeling tool for capturing complex dynamics to address these challenges. However, its separate design often lacks systematic integration with existing methods. T o fill the gap, this paper proposes a holistic framework of Learning-Augmented Power System Operations (LAPSO, pronounced as Lap-So). Adopting a native optimization perspective, LAPSO is centered on the operation stage and aims to break the boundary between temporally siloed power system tasks, such as forecast, operation and control, while unifying the objectives of machine learning and model-based optimizations at both training and inference stages. Systematic analysis and simulations demonstrate the effectiveness of applying LAPSO in designing new integrated algorithms, such as stability-constrained optimization (SCO) and objective-based forecasting (OBF), while enabling end-to-end tracing of different sources of uncertainties. In addition, a dedicated Python package-lapso is introduced to automatically augment existing power system optimization models with learnable components. All code and data are available at https://github.com/xuwkk/lapso_exp. Index T erms --Power system operation, machine learning, objective-based forecasting, stability-constrained optimization. A. Background and Motivation Power system decision-making consists of sequentially connected tasks, including modeling/forecasting, operation, and control (See Figure 1(a).) With the decarbonization need, traditional model-based approaches face significant challenges. For example, the increasing uncertainty associated with renewable generation undermines the reliability of deterministic forecasting and power system operation (PSO) [2]. Meanwhile, the declining share of inertia from synchronous generators (SGs) can cause grid instability [3].


Reviews: Learning to Exploit Stability for 3D Scene Parsing

Neural Information Processing Systems

The goal of this paper is to output a set of 3D bounding boxes and set of dominant planes for a scene depicted in a single image. The key insight is to incorporate stability constraints in the 3D layout, i.e., the reconstructed 3D boxes should not move too far under simulation (in Bullet) with physical forces (gravity, friction). Parameters for 3D boxes are regressed using a modified R-CNN training loss and dominant planes for the walls and floors are regressed via a RNN. A stability criterion is used to update the output 3D scene (via REINFORCE) where the predicted 3D layout is run through Bullet simulator and 3D displacements are checked. Results are shown on synthetic (SUNCG, SceneNet RGB-D) and real (SUN RGB-D) datasets, out-performing the factored 3D approach of [Tulsiani18].


Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach

Wang, Renzi, Acerbo, Flavia Sofia, Son, Tong Duy, Patrinos, Panagiotis

arXiv.org Artificial Intelligence

This paper presents a novel approach to imitation learning from observations, where an autoregressive mixture of experts model is deployed to fit the underlying policy. The parameters of the model are learned via a two-stage framework. By leveraging the existing dynamics knowledge, the first stage of the framework estimates the control input sequences and hence reduces the problem complexity. At the second stage, the policy is learned by solving a regularized maximum-likelihood estimation problem using the estimated control input sequences. We further extend the learning procedure by incorporating a Lyapunov stability constraint to ensure asymptotic stability of the identified model, for accurate multi-step predictions. The effectiveness of the proposed framework is validated using two autonomous driving datasets collected from human demonstrations, demonstrating its practical applicability in modelling complex nonlinear dynamics.