Muehlebach, Michael
Distributed Event-Based Learning via ADMM
Er, Guner Dilsad, Trimpe, Sebastian, Muehlebach, Michael
We consider a distributed learning problem, where agents minimize a global objective function by exchanging information over a network. Our approach has two distinct features: (i) It substantially reduces communication by triggering communication only when necessary, and (ii) it is agnostic to the data-distribution among the different agents. We can therefore guarantee convergence even if the local data-distributions of the agents are arbitrarily distinct. We analyze the convergence rate of the algorithm and derive accelerated convergence rates in a convex setting. We also characterize the effect of communication drops and demonstrate that our algorithm is robust to communication failures. The article concludes by presenting numerical results from a distributed LASSO problem, and distributed learning tasks on MNIST and CIFAR-10 datasets. The experiments underline communication savings of 50% or more due to the event-based communication strategy, show resilience towards heterogeneous data-distributions, and highlight that our approach outperforms common baselines such as FedAvg, FedProx, and FedADMM.
Stochastic Online Optimization for Cyber-Physical and Robotic Systems
Ma, Hao, Zeilinger, Melanie, Muehlebach, Michael
We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our problem formulation accommodates constraints that model the evolution of a cyber-physical system, which has, in general, a continuous state and action space, is nonlinear, and where the state is only partially observed. We also incorporate an approximate model of the dynamics as prior knowledge into the learning process and show that even rough estimates of the dynamics can significantly improve the convergence of our algorithms. Our online optimization framework encompasses both gradient descent and quasi-Newton methods, and we provide a unified convergence analysis of our algorithms in a non-convex setting. We also characterize the impact of modeling errors in the system dynamics on the convergence rate of the algorithms. Finally, we evaluate our algorithms in simulations of a flexible beam, a four-legged walking robot, and in real-world experiments with a ping-pong playing robot.
Primal Methods for Variational Inequality Problems with Functional Constraints
Zhang, Liang, He, Niao, Muehlebach, Michael
Constrained variational inequality problems are recognized for their broad applications across various fields including machine learning and operations research. First-order methods have emerged as the standard approach for solving these problems due to their simplicity and scalability. However, they typically rely on projection or linear minimization oracles to navigate the feasible set, which becomes computationally expensive in practical scenarios featuring multiple functional constraints. Existing efforts to tackle such functional constrained variational inequality problems have centered on primal-dual algorithms grounded in the Lagrangian function. These algorithms along with their theoretical analysis often require the existence and prior knowledge of the optimal Lagrange multipliers. In this work, we propose a simple primal method, termed Constrained Gradient Method (CGM), for addressing functional constrained variational inequality problems, without necessitating any information on the optimal Lagrange multipliers. We establish a non-asymptotic convergence analysis of the algorithm for variational inequality problems with monotone operators under smooth constraints. Remarkably, our algorithms match the complexity of projection-based methods in terms of operator queries for both monotone and strongly monotone settings, while utilizing significantly cheaper oracles based on quadratic programming. Furthermore, we provide several numerical examples to evaluate the efficacy of our algorithms.
Towards a Systems Theory of Algorithms
Dรถrfler, Florian, He, Zhiyu, Belgioioso, Giuseppe, Bolognani, Saverio, Lygeros, John, Muehlebach, Michael
Traditionally, numerical algorithms are seen as isolated pieces of code confined to an {\em in silico} existence. However, this perspective is not appropriate for many modern computational approaches in control, learning, or optimization, wherein {\em in vivo} algorithms interact with their environment. Examples of such {\em open} include various real-time optimization-based control strategies, reinforcement learning, decision-making architectures, online optimization, and many more. Further, even {\em closed} algorithms in learning or optimization are increasingly abstracted in block diagrams with interacting dynamic modules and pipelines. In this opinion paper, we state our vision on a to-be-cultivated {\em systems theory of algorithms} and argue in favour of viewing algorithms as open dynamical systems interacting with other algorithms, physical systems, humans, or databases. Remarkably, the manifold tools developed under the umbrella of systems theory also provide valuable insights into this burgeoning paradigm shift and its accompanying challenges in the algorithmic world. We survey various instances where the principles of algorithmic systems theory are being developed and outline pertinent modeling, analysis, and design challenges.
Accelerated First-Order Optimization under Nonlinear Constraints
Muehlebach, Michael, Jordan, Michael I.
We exploit analogies between first-order algorithms for constrained optimization and non-smooth dynamical systems to design a new class of accelerated first-order algorithms for constrained optimization. Unlike Frank-Wolfe or projected gradients, these algorithms avoid optimization over the entire feasible set at each iteration. We prove convergence to stationary points even in a nonconvex setting and we derive accelerated rates for the convex setting both in continuous time, as well as in discrete time. An important property of these algorithms is that constraints are expressed in terms of velocities instead of positions, which naturally leads to sparse, local and convex approximations of the feasible set (even if the feasible set is nonconvex). Thus, the complexity tends to grow mildly in the number of decision variables and in the number of constraints, which makes the algorithms suitable for machine learning applications. We apply our algorithms to a compressed sensing and a sparse regression problem, showing that we can treat nonconvex $\ell^p$ constraints ($p<1$) efficiently, while recovering state-of-the-art performance for $p=1$.
Deep Backtracking Counterfactuals for Causally Compliant Explanations
Kladny, Klaus-Rudolf, von Kรผgelgen, Julius, Schรถlkopf, Bernhard, Muehlebach, Michael
Counterfactuals can offer valuable insights by answering what would have been observed under altered circumstances, conditional on a factual observation. Whereas the classical interventional interpretation of counterfactuals has been studied extensively, backtracking constitutes a less studied alternative the backtracking principle has emerged as an alternative philosophy where all causal laws are kept intact. In the present work, we introduce a practical method for computing backtracking counterfactuals in structural causal models that consist of deep generative components. To this end, we impose conditions on the structural assignments that enable the generation of counterfactuals by solving a tractable constrained optimization problem in the structured latent space of a causal model. Our formulation also facilitates a comparison with methods in the field of counterfactual explanations. Compared to these, our method represents a versatile, modular and causally compliant alternative. We demonstrate these properties experimentally on a modified version of MNIST and CelebA.
Online Learning under Adversarial Nonlinear Constraints
Kolev, Pavel, Martius, Georg, Muehlebach, Michael
In many applications, learning systems are required to process continuous non-stationary data streams. We study this problem in an online learning framework and propose an algorithm that can deal with adversarial time-varying and nonlinear constraints. As we show in our work, the algorithm called Constraint Violation Velocity Projection (CVV-Pro) achieves $\sqrt{T}$ regret and converges to the feasible set at a rate of $1/\sqrt{T}$, despite the fact that the feasible set is slowly time-varying and a priori unknown to the learner. CVV-Pro only relies on local sparse linear approximations of the feasible set and therefore avoids optimizing over the entire set at each iteration, which is in sharp contrast to projected gradients or Frank-Wolfe methods. We also empirically evaluate our algorithm on two-player games, where the players are subjected to a shared constraint.
Black-Box vs. Gray-Box: A Case Study on Learning Table Tennis Ball Trajectory Prediction with Spin and Impacts
Achterhold, Jan, Tobuschat, Philip, Ma, Hao, Buechler, Dieter, Muehlebach, Michael, Stueckler, Joerg
In this paper, we present a method for table tennis ball trajectory filtering and prediction. Our gray-box approach builds on a physical model. At the same time, we use data to learn parameters of the dynamics model, of an extended Kalman filter, and of a neural model that infers the ball's initial condition. We demonstrate superior prediction performance of our approach over two black-box approaches, which are not supplied with physical prior knowledge. We demonstrate that initializing the spin from parameters of the ball launcher using a neural network drastically improves long-time prediction performance over estimating the spin purely from measured ball positions. An accurate prediction of the ball trajectory is crucial for successful returns. We therefore evaluate the return performance with a pneumatic artificial muscular robot and achieve a return rate of 29/30 (97.7%).
Causal Effect Estimation from Observational and Interventional Data Through Matrix Weighted Linear Estimators
Kladny, Klaus-Rudolf, von Kรผgelgen, Julius, Schรถlkopf, Bernhard, Muehlebach, Michael
We study causal effect estimation from a mixture of observational and interventional data in a confounded linear regression model with multivariate treatments. We show that the statistical efficiency in terms of expected squared error can be improved by combining estimators arising from both the observational and interventional setting. To this end, we derive methods based on matrix weighted linear estimators and prove that our methods are asymptotically unbiased in the infinite sample limit. This is an important improvement compared to the pooled estimator using the union of interventional and observational data, for which the bias only vanishes if the ratio of observational to interventional data tends to zero. Studies on synthetic data confirm our theoretical findings. In settings where confounding is substantial and the ratio of observational to interventional data is large, our estimators outperform a Stein-type estimator and various other baselines.
Adaptive Decision-Making with Constraints and Dependent Losses: Performance Guarantees and Applications to Online and Nonlinear Identification
Muehlebach, Michael
We consider adaptive decision-making problems where an agent optimizes a cumulative performance objective by repeatedly choosing among a finite set of options. Compared to the classical prediction-with-expert-advice set-up, we consider situations where losses are constrained and derive algorithms that exploit the additional structure in optimal and computationally efficient ways. Our algorithm and our analysis is instance dependent, that is, suboptimal choices of the environment are exploited and reflected in our regret bounds. The constraints handle general dependencies between losses (even across time), and are flexible enough to also account for a loss budget, which the environment is not allowed to exceed. The performance of the resulting algorithms is highlighted in two numerical examples, which include a nonlinear and online system identification task.