Goto

Collaborating Authors

 constraint value



Learning Soft Driving Constraints from Vectorized Scene Embeddings while Imitating Expert Trajectories

Mobarakeh, Niloufar Saeidi, Khamidehi, Behzad, Li, Chunlin, Mirkhani, Hamidreza, Arasteh, Fazel, Elmahgiubi, Mohammed, Zhang, Weize, Rezaee, Kasra, Poupart, Pascal

arXiv.org Artificial Intelligence

The primary goal of motion planning is to generate safe and efficient trajectories for vehicles. Traditionally, motion planning models are trained using imitation learning to mimic the behavior of human experts. However, these models often lack interpretability and fail to provide clear justifications for their decisions. We propose a method that integrates constraint learning into imitation learning by extracting driving constraints from expert trajectories. Our approach utilizes vectorized scene embeddings that capture critical spatial and temporal features, enabling the model to identify and generalize constraints across various driving scenarios. We formulate the constraint learning problem using a maximum entropy model, which scores the motion planner's trajectories based on their similarity to the expert trajectory. By separating the scoring process into distinct reward and constraint streams, we improve both the interpretability of the planner's behavior and its attention to relevant scene components. Unlike existing constraint learning methods that rely on simulators and are typically embedded in reinforcement learning (RL) or inverse reinforcement learning (IRL) frameworks, our method operates without simulators, making it applicable to a wider range of datasets and real-world scenarios. Experimental results on the InD and TrafficJams datasets demonstrate that incorporating driving constraints enhances model interpretability and improves closed-loop performance.


Sliding Window 3-Objective Pareto Optimization for Problems with Chance Constraints

Neumann, Frank, Witt, Carsten

arXiv.org Artificial Intelligence

Multi-objective formulations have been widely used to solve single-objective optimization problems. The initial study carried out by Knowles et al. [8] for the H-IFF and the traveling salesperson problem shows that such formulations can significantly reduce the number of local optima in the search space and uses the term multi-objectivization for such approaches. Using multi-objective formulations to solve constrained single-objective optimization problems by evolutionary multi-objective optimization using the constraint as an additional objective has shown to be highly beneficial for a wide range of problems [4,9,12]. Using the constraint as an additional objective for such problems allows simple evolutionary multi-objective algorithms such as GSEMO mimic a greedy behaviour and as a consequence allows us to achieve theoretically best possible performance guarantees for a wide range of constrained submodular optimization problems [17-19]. Such approaches have been widely studied recently under the term Pareto optimization in the artificial intelligence and machine learning literature [22]. In the context of problems with stochastic constraints, it has recently been shown that 3-objective formulations where the given constraint is relaxed into a third objective lead to better performance than 2-objective formulations that optimize the expected value and variance of the given stochastic components under the given constraint [14, 15].


Statistical learning for constrained functional parameters in infinite-dimensional models with applications in fair machine learning

Nabi, Razieh, Hejazi, Nima S., van der Laan, Mark J., Benkeser, David

arXiv.org Machine Learning

Constrained learning has become increasingly important, especially in the realm of algorithmic fairness and machine learning. In these settings, predictive models are developed specifically to satisfy pre-defined notions of fairness. Here, we study the general problem of constrained statistical machine learning through a statistical functional lens. We consider learning a function-valued parameter of interest under the constraint that one or several pre-specified real-valued functional parameters equal zero or are otherwise bounded. We characterize the constrained functional parameter as the minimizer of a penalized risk criterion using a Lagrange multiplier formulation. We show that closed-form solutions for the optimal constrained parameter are often available, providing insight into mechanisms that drive fairness in predictive models. Our results also suggest natural estimators of the constrained parameter that can be constructed by combining estimates of unconstrained parameters of the data generating distribution. Thus, our estimation procedure for constructing fair machine learning algorithms can be applied in conjunction with any statistical learning approach and off-the-shelf software. We demonstrate the generality of our method by explicitly considering a number of examples of statistical fairness constraints and implementing the approach using several popular learning approaches.


Learning Soft Constraints From Constrained Expert Demonstrations

Gaurav, Ashish, Rezaee, Kasra, Liu, Guiliang, Poupart, Pascal

arXiv.org Artificial Intelligence

Inverse reinforcement learning (IRL) methods assume that the expert data is generated by an agent optimizing some reward function. However, in many settings, the agent may optimize a reward function subject to some constraints, where the constraints induce behaviors that may be otherwise difficult to express with just a reward function. We consider the setting where the reward function is given, and the constraints are unknown, and propose a method that is able to recover these constraints satisfactorily from the expert data. While previous work has focused on recovering hard constraints, our method can recover cumulative soft constraints that the agent satisfies on average per episode. In IRL fashion, our method solves this problem by adjusting the constraint function iteratively through a constrained optimization procedure, until the agent behavior matches the expert behavior. We demonstrate our approach on synthetic environments, robotics environments and real world highway driving scenarios.


Pragmatic Fairness: Developing Policies with Outcome Disparity Control

Gultchin, Limor, Guo, Siyuan, Malek, Alan, Chiappa, Silvia, Silva, Ricardo

arXiv.org Artificial Intelligence

We introduce a causal framework for designing optimal policies that satisfy fairness constraints. We take a pragmatic approach asking what we can do with an action space available to us and only with access to historical data. We propose two different fairness constraints: a moderation breaking constraint which aims at blocking moderation paths from the action and sensitive attribute to the outcome, and by that at reducing disparity in outcome levels as much as the provided action space permits; and an equal benefit constraint which aims at distributing gain from the new and maximized policy equally across sensitive attribute levels, and thus at keeping pre-existing preferential treatment in place or avoiding the introduction of new disparity. We introduce practical methods for implementing the constraints and illustrate their uses on experiments with semi-synthetic models.


Projection-Based Constrained Policy Optimization

Yang, Tsung-Yen, Rosca, Justinian, Narasimhan, Karthik, Ramadge, Peter J.

arXiv.org Artificial Intelligence

We consider the problem of learning control policies that optimize a reward function while satisfying constraints due to considerations of safety, fairness, or other costs. We propose a new algorithm, Projection-Based Constrained Policy Optimization (PCPO). This is an iterative method for optimizing policies in a two-step process: the first step performs a local reward improvement update, while the second step reconciles any constraint violation by projecting the policy back onto the constraint set. We theoretically analyze PCPO and provide a lower bound on reward improvement, and an upper bound on constraint violation, for each policy update. We further characterize the convergence of PCPO based on two different metrics: $\normltwo$ norm and Kullback-Leibler divergence. Our empirical results over several control tasks demonstrate that PCPO achieves superior performance, averaging more than 3.5 times less constraint violation and around 15\% higher reward compared to state-of-the-art methods.