Goto

Collaborating Authors

 Dörfler, Florian


Learn to Bid as a Price-Maker Wind Power Producer

arXiv.org Artificial Intelligence

Wind power producers (WPPs) participating in short-term power markets face significant imbalance costs due to their non-dispatchable and variable production. While some WPPs have a large enough market share to influence prices with their bidding decisions, existing optimal bidding methods rarely account for this aspect. Price-maker approaches typically model bidding as a bilevel optimization problem, but these methods require complex market models, estimating other participants' actions, and are computationally demanding. To address these challenges, we propose an online learning algorithm that leverages contextual information to optimize WPP bids in the price-maker setting. We formulate the strategic bidding problem as a contextual multi-armed bandit, ensuring provable regret minimization. The algorithm's performance is evaluated against various benchmark strategies using a numerical simulation of the German day-ahead and real-time markets.


Decision-Dependent Stochastic Optimization: The Role of Distribution Dynamics

arXiv.org Artificial Intelligence

Distribution shifts have long been regarded as troublesome external forces that a decision-maker should either counteract or conform to. An intriguing feedback phenomenon termed decision dependence arises when the deployed decision affects the environment and alters the data-generating distribution. In the realm of performative prediction, this is encoded by distribution maps parameterized by decisions due to strategic behaviors. In contrast, we formalize an endogenous distribution shift as a feedback process featuring nonlinear dynamics that couple the evolving distribution with the decision. Stochastic optimization in this dynamic regime provides a fertile ground to examine the various roles played by dynamics in the composite problem structure. To this end, we develop an online algorithm that achieves optimal decision-making by both adapting to and shaping the dynamic distribution. Throughout the paper, we adopt a distributional perspective and demonstrate how this view facilitates characterizations of distribution dynamics and the optimality and generalization performance of the proposed algorithm. We showcase the theoretical results in an opinion dynamics context, where an opportunistic party maximizes the affinity of a dynamic polarized population, and in a recommender system scenario, featuring performance optimization with discrete distributions in the probability simplex.


An Adaptive Data-Enabled Policy Optimization Approach for Autonomous Bicycle Control

arXiv.org Artificial Intelligence

--This paper presents a unified control framework that integrates a Feedback Linearization (FL) controller in the inner loop with an adaptive Data-Enabled Policy Optimization (DeePO) controller in the outer loop to balance an autonomous bicycle. While the FL controller stabilizes and partially linearizes the inherently unstable and nonlinear system, its performance is compromised by unmodeled dynamics and time-varying characteristics. T o overcome these limitations, the DeePO controller is introduced to enhance adaptability and robustness. The initial control policy of DeePO is obtained from a finite set of offline, persistently exciting input and state data. T o improve stability and compensate for system nonlinearities and disturbances, a robustness-promoting regularizer refines the initial policy, while the adaptive section of the DeePO framework is enhanced with a forgetting factor to improve adaptation to time-varying dynamics. The proposed DeePO+FL approach is evaluated through simulations and real-world experiments on an instrumented autonomous bicycle. Results demonstrate its superiority over the FL-only approach, achieving more precise tracking of the reference lean angle and lean rate. N autonomous bicycle is a bicycle, equipped with electric motors, sensors, algorithms, and control systems that allow the bicycle to navigate and operate without human intervention. Autonomous bicycles are an exciting area of research and development with numerous potential applications that can improve transportation, safety, and efficiency. In bicycle-sharing systems, autonomous bicycles can enhance the user experience by autonomously traveling to a person who has requested one, eliminating the need for individuals to walk toward the bicycle [1]. Additionally, autonomous bicycles can streamline fleet management by enabling bicycles to autonomously navigate to charging stations for recharging. This eliminates the need for operators to manually collect, load, and transport bicycles to charging stations, making the process more efficient. This work was supported by the Knowledge Foundation (KKS) with grant "M alardalen University Automation Research Center (MARC)", n. 20240011. Papadopoulos are with the Division of Intelligent Future Technologies, M alardalen University, 721 23 V aster as, Sweden. F. Zhao and F. D orfler are with the Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland.


Contractivity and linear convergence in bilinear saddle-point problems: An operator-theoretic approach

arXiv.org Artificial Intelligence

We study the convex-concave bilinear saddle-point problem $\min_x \max_y f(x) + y^\top Ax - g(y)$, where both, only one, or none of the functions $f$ and $g$ are strongly convex, and suitable rank conditions on the matrix $A$ hold. The solution of this problem is at the core of many machine learning tasks. By employing tools from operator theory, we systematically prove the contractivity (in turn, the linear convergence) of several first-order primal-dual algorithms, including the Chambolle-Pock method. Our approach results in concise and elegant proofs, and it yields new convergence guarantees and tighter bounds compared to known results.


When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL

arXiv.org Artificial Intelligence

Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP). However, various systems are inherently continuous in time, making discrete-time MDPs an inexact modeling choice. In many applications, such as greenhouse control or medical treatments, each interaction (measurement or switching of action) involves manual intervention and thus is inherently costly. Therefore, we generally prefer a time-adaptive approach with fewer interactions with the system. In this work, we formalize an RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this challenge by optimizing over policies that besides control predict the duration of its application. Our formulation results in an extended MDP that any standard RL algorithm can solve. We demonstrate that state-of-the-art RL algorithms trained on TaCoS drastically reduce the interaction amount over their discrete-time counterpart while retaining the same or improved performance, and exhibiting robustness over discretization frequency. Finally, we propose OTaCoS, an efficient model-based algorithm for our setting. We show that OTaCoS enjoys sublinear regret for systems with sufficiently smooth dynamics and empirically results in further sample-efficiency gains.


NeoRL: Efficient Exploration for Nonepisodic RL

arXiv.org Artificial Intelligence

We study the problem of nonepisodic reinforcement learning (RL) for nonlinear dynamical systems, where the system dynamics are unknown and the RL agent has to learn from a single trajectory, i.e., without resets. We propose Nonepisodic Optimistic RL (NeoRL), an approach based on the principle of optimism in the face of uncertainty. NeoRL uses well-calibrated probabilistic models and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics. Under continuity and bounded energy assumptions on the system, we provide a first-of-its-kind regret bound of $\setO(\beta_T \sqrt{T \Gamma_T})$ for general nonlinear systems with Gaussian process dynamics. We compare NeoRL to other baselines on several deep RL environments and empirically demonstrate that NeoRL achieves the optimal average cost while incurring the least regret.


Distributed Traffic Signal Control via Coordinated Maximum Pressure-plus-Penalty

arXiv.org Artificial Intelligence

This paper develops an adaptive traffic control policy inspired by Maximum Pressure (MP) while imposing coordination across intersections. The proposed Coordinated Maximum Pressure-plus-Penalty (CMPP) control policy features a local objective for each intersection that consists of the total pressure within the neighborhood and a penalty accounting for the queue capacities and continuous green time for certain movements. The corresponding control task is reformulated as a distributed optimization problem and solved via two customized algorithms: one based on the alternating direction method of multipliers (ADMM) and the other follows a greedy heuristic augmented with a majority vote. CMPP not only provides a theoretical guarantee of queuing network stability but also outperforms several benchmark controllers in simulations on a large-scale real traffic network with lower average travel and waiting time per vehicle, as well as less network congestion. Furthermore, CPMM with the greedy algorithm enjoys comparable computational efficiency as fully decentralized controllers without significantly compromising the control performance, which highlights its great potential for real-world deployment.


Bridging the Sim-to-Real Gap with Bayesian Inference

arXiv.org Artificial Intelligence

We present SIM-FSVGD for learning robot dynamics from data. As opposed to traditional methods, SIM-FSVGD leverages low-fidelity physical priors, e.g., in the form of simulators, to regularize the training of neural network models. While learning accurate dynamics already in the low data regime, SIM-FSVGD scales and excels also when more data is available. We empirically show that learning with implicit physical priors results in accurate mean model estimation as well as precise uncertainty quantification. We demonstrate the effectiveness of SIM-FSVGD in bridging the sim-to-real gap on a high-performance RC racecar system. Using model-based RL, we demonstrate a highly dynamic parking maneuver with drifting, using less than half the data compared to the state of the art.


Towards a Systems Theory of Algorithms

arXiv.org Artificial Intelligence

Traditionally, numerical algorithms are seen as isolated pieces of code confined to an {\em in silico} existence. However, this perspective is not appropriate for many modern computational approaches in control, learning, or optimization, wherein {\em in vivo} algorithms interact with their environment. Examples of such {\em open} include various real-time optimization-based control strategies, reinforcement learning, decision-making architectures, online optimization, and many more. Further, even {\em closed} algorithms in learning or optimization are increasingly abstracted in block diagrams with interacting dynamic modules and pipelines. In this opinion paper, we state our vision on a to-be-cultivated {\em systems theory of algorithms} and argue in favour of viewing algorithms as open dynamical systems interacting with other algorithms, physical systems, humans, or databases. Remarkably, the manifold tools developed under the umbrella of systems theory also provide valuable insights into this burgeoning paradigm shift and its accompanying challenges in the algorithmic world. We survey various instances where the principles of algorithmic systems theory are being developed and outline pertinent modeling, analysis, and design challenges.


Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning

arXiv.org Machine Learning

Wasserstein distributionally robust optimization has recently emerged as a powerful framework for robust estimation, enjoying good out-of-sample performance guarantees, well-understood regularization effects, and computationally tractable reformulations. In such framework, the estimator is obtained by minimizing the worst-case expected loss over all probability distributions which are close, in a Wasserstein sense, to the empirical distribution. In this paper, we propose a Wasserstein distributionally robust estimation framework to estimate an unknown parameter from noisy linear measurements, and we focus on the task of analyzing the squared error performance of such estimators. Our study is carried out in the modern high-dimensional proportional regime, where both the ambient dimension and the number of samples go to infinity at a proportional rate which encodes the under/over-parametrization of the problem. Under an isotropic Gaussian features assumption, we show that the squared error can be recovered as the solution of a convex-concave optimization problem which, surprinsingly, involves at most four scalar variables. Importantly, the precise quantification of the squared error allows to accurately and efficiently compare different ambiguity radii and to understand the effect of the under/over-parametrization on the estimation error. We conclude the paper with a list of exciting research directions enabled by our results.