Constraint-Based Reasoning
BiConMP: A Nonlinear Model Predictive Control Framework for Whole Body Motion Planning
Meduri, Avadesh, Shah, Paarth, Viereck, Julian, Khadiv, Majid, Havoutis, Ioannis, Righetti, Ludovic
Online planning of whole-body motions for legged robots is challenging due to the inherent nonlinearity in the robot dynamics. In this work, we propose a nonlinear MPC framework, the BiConMP which can generate whole body trajectories online by efficiently exploiting the structure of the robot dynamics. BiConMP is used to generate various cyclic gaits on a real quadruped robot and its performance is evaluated on different terrain, countering unforeseen pushes and transitioning online between different gaits. Further, the ability of BiConMP to generate non-trivial acyclic whole-body dynamic motions on the robot is presented. The same approach is also used to generate various dynamic motions in MPC on a humanoid robot (Talos) and another quadruped robot (AnYmal) in simulation. Finally, an extensive empirical analysis on the effects of planning horizon and frequency on the nonlinear MPC framework is reported and discussed.
A Unifying Framework for Online Optimization with Long-Term Constraints
Castiglioni, Matteo, Celli, Andrea, Marchesi, Alberto, Romano, Giulia, Gatti, Nicola
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal of the decision maker is to maximize their total reward, while at the same time achieving small cumulative constraints violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class of problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown stochastic model, and in the case in which they are selected at each round by an adversary. Our algorithm is the first to provide guarantees in the adversarial setting with respect to the optimal fixed strategy that satisfies the long-term constraints. In particular, it guarantees a $\rho/(1+\rho)$ fraction of the optimal reward and sublinear regret, where $\rho$ is a feasibility parameter related to the existence of strictly feasible solutions. Our framework employs traditional regret minimizers as black-box components. Therefore, by instantiating it with an appropriate choice of regret minimizers it can handle the full-feedback as well as the bandit-feedback setting. Moreover, it allows the decision maker to seamlessly handle scenarios with non-convex rewards and constraints. We show how our framework can be applied in the context of budget-management mechanisms for repeated auctions in order to guarantee long-term constraints that are not packing (e.g., ROI constraints).
Solutions to preference manipulation in recommender systems require knowledge of meta-preferences
Iterative machine learning algorithms used to power recommender systems often change people's preferences by trying to learn them. Further a recommender can better predict what a user will do by making its users more predictable. Some preference changes on the part of the user are self-induced and desired whether the recommender caused them or not. This paper proposes that solutions to preference manipulation in recommender systems must take into account certain meta-preferences (preferences over another preference) in order to respect the autonomy of the user and not be manipulative.
Adversarial Examples in Constrained Domains
Sheatsley, Ryan, Papernot, Nicolas, Weisman, Michael, Verma, Gunjan, McDaniel, Patrick
Machine learning algorithms have been shown to be vulnerable to adversarial manipulation through systematic modification of inputs (e.g., adversarial examples) in domains such as image recognition. Under the default threat model, the adversary exploits the unconstrained nature of images; each feature (pixel) is fully under control of the adversary. However, it is not clear how these attacks translate to constrained domains that limit which and how features can be modified by the adversary (e.g., network intrusion detection). In this paper, we explore whether constrained domains are less vulnerable than unconstrained domains to adversarial example generation algorithms. We create an algorithm for generating adversarial sketches: targeted universal perturbation vectors which encode feature saliency within the envelope of domain constraints. To assess how these algorithms perform, we evaluate them in constrained (e.g., network intrusion detection) and unconstrained (e.g., image recognition) domains. The results demonstrate that our approaches generate misclassification rates in constrained domains that were comparable to those of unconstrained domains (greater than 95%). Our investigation shows that the narrow attack surface exposed by constrained domains is still sufficiently large to craft successful adversarial examples; and thus, constraints do not appear to make a domain robust. Indeed, with as little as five randomly selected features, one can still generate adversarial examples.
Safe reinforcement learning for multi-energy management systems with known constraint functions
Ceusters, Glenn, Camargo, Luis Ramirez, Franke, Rรผdiger, Nowรฉ, Ann, Messagie, Maarten
Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various potentially unsafe interactions within its safety-critical environment. In this paper, we present two novel safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation. These provide hard-constraint, rather than soft- and chance-constraint, satisfaction guarantees both during training a (near) optimal policy (which involves exploratory and exploitative, i.e. greedy, steps) as well as during deployment of any policy (e.g. random agents or offline trained RL agents). This without the need of solving a mathematical program, resulting in less computational power requirements and a more flexible constraint function formulation (no derivative information is required). In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility (i.e. useful policy) compared to a vanilla RL benchmark and Optlayer benchmark (94,6% and 82,8% compared to 35,5% and 77,8%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques applicable beyond RL, as demonstrated with random policies while still providing hard-constraint guarantees.
One Model, Any CSP: Graph Neural Networks as Fast Global Search Heuristics for Constraint Satisfaction
Tรถnshoff, Jan, Kisin, Berke, Lindner, Jakob, Grohe, Martin
We propose a universal Graph Neural Network architecture which can be trained as an end-2-end search heuristic for any Constraint Satisfaction Problem (CSP). Our architecture can be trained unsupervised with policy gradient descent to generate problem specific heuristics for any CSP in a purely data driven manner. The approach is based on a novel graph representation for CSPs that is both generic and compact and enables us to process every possible CSP instance with one GNN, regardless of constraint arity, relations or domain size. Unlike previous RL-based methods, we operate on a global search action space and allow our GNN to modify any number of variables in every step of the stochastic search. This enables our method to properly leverage the inherent parallelism of GNNs. We perform a thorough empirical evaluation where we learn heuristics for well known and important CSPs from random data, including graph coloring, MaxCut, 3-SAT and MAX-k-SAT. Our approach outperforms prior approaches for neural combinatorial optimization by a substantial margin. It can compete with, and even improve upon, conventional search heuristics on test instances that are several orders of magnitude larger and structurally more complex than those seen during training.
Explainability in Mechanism Design: Recent Advances and the Road Ahead
Suryanarayana, Sharadhi Alape, Sarne, David, Kraus, Sarit
Designing and implementing explainable systems is seen as the next step towards increasing user trust in, acceptance of and reliance on Artificial Intelligence (AI) systems. While explaining choices made by black-box algorithms such as machine learning and deep learning has occupied most of the limelight, systems that attempt to explain decisions (even simple ones) in the context of social choice are steadily catching up. In this paper, we provide a comprehensive survey of explainability in mechanism design, a domain characterized by economically motivated agents and often having no single choice that maximizes all individual utility functions. We discuss the main properties and goals of explainability in mechanism design, distinguishing them from those of Explainable AI in general. This discussion is followed by a thorough review of the challenges one may face when working on Explainable Mechanism Design and propose a few solution concepts to those.
Constraint-based Task Specification and Trajectory Optimization for Sequential Manipulation
Phoon, Mun Seng, Schmitt, Philipp S., Wichert, Georg v.
To economically deploy robotic manipulators the programming and execution of robot motions must be swift. To this end, we propose a novel, constraint-based method to intuitively specify sequential manipulation tasks and to compute time-optimal robot motions for such a task specification. Our approach follows the ideas of constraint-based task specification by aiming for a minimal and object-centric task description that is largely independent of the underlying robot kinematics. We transform this task description into a non-linear optimization problem. By solving this problem we obtain a (locally) time-optimal robot motion, not just for a single motion, but for an entire manipulation sequence. We demonstrate the capabilities of our approach in a series of experiments involving five distinct robot models, including a highly redundant mobile manipulator.
Exploration, Path Planning with Obstacle and Collision Avoidance in a Dynamic Environment
Alirezazadeh, Saeid, Alexandre, Luรญs A.
If we give a robot the task of moving an object from its current position to another location in an unknown environment, the robot must explore the map, identify all types of obstacles, and then determine the best route to complete the task. We proposed a mathematical model to find an optimal path planning that avoids collisions with all static and moving obstacles and has the minimum completion time and the minimum distance traveled. In this model, the bounding box around obstacles and robots is not considered, so the robot can move very close to the obstacles without colliding with them. We considered two types of obstacles: deterministic, which include all static obstacles such as walls that do not move and all moving obstacles whose movements have a fixed pattern, and non-deterministic, which include all obstacles whose movements can occur in any direction with some probability distribution at any time. We also consider the acceleration and deceleration of the robot to improve collision avoidance.
Refining Control Barrier Functions through Hamilton-Jacobi Reachability
Tonkens, Sander, Herbert, Sylvia
Safety filters based on Control Barrier Functions (CBFs) have emerged as a practical tool for the safety-critical control of autonomous systems. These approaches encode safety through a value function and enforce safety by imposing a constraint on the time derivative of this value function. However, synthesizing a valid CBF that is not overly conservative in the presence of input constraints is a notorious challenge. In this work, we propose refining a candidate CBF using formal verification methods to obtain a valid CBF. In particular, we update an expert-synthesized or backup CBF using dynamic programming (DP) based reachability analysis. Our framework, refineCBF, guarantees that with every DP iteration the obtained CBF is provably at least as safe as the prior iteration and converges to a valid CBF. Therefore, refineCBF can be used in-the-loop for robotic systems. We demonstrate the practicality of our method to enhance safety and/or reduce conservativeness on a range of nonlinear control-affine systems using various CBF synthesis techniques in simulation.