AITopics | Delage, Erick

Collaborating Authors

Delage, Erick

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Planning and Learning in Average Risk-aware MDPs

Wang, Weikai, Delage, Erick

arXiv.org Artificial IntelligenceMar-21-2025

For continuing tasks, average cost Markov decision processes have well-documented value and can be solved using efficient algorithms. However, it explicitly assumes that the agent is risk-neutral. In this work, we extend risk-neutral algorithms to accommodate the more general class of dynamic risk measures. Specifically, we propose a relative value iteration (RVI) algorithm for planning and design two model-free Q-learning algorithms, namely a generic algorithm based on the multi-level Monte Carlo method, and an off-policy algorithm dedicated to utility-base shortfall risk measures. Both the RVI and MLMC-based Q-learning algorithms are proven to converge to optimality. Numerical experiments validate our analysis, confirms empirically the convergence of the off-policy algorithm, and demonstrate that our approach enables the identification of policies that are finely tuned to the intricate risk-awareness of the agent that they serve.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2503.17629

Country:

North America > United States > Massachusetts > Middlesex County (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Navigating Demand Uncertainty in Container Shipping: Deep Reinforcement Learning for Enabling Adaptive and Feasible Master Stowage Planning

van Twiller, Jaike, Adulyasak, Yossiri, Delage, Erick, Grbic, Djordje, Jensen, Rune Møller

arXiv.org Artificial IntelligenceFeb-18-2025

Reinforcement learning (RL) has shown promise in solving various combinatorial optimization problems. However, conventional RL faces challenges when dealing with real-world constraints, especially when action space feasibility is explicit and dependent on the corresponding state or trajectory. In this work, we focus on using RL in container shipping, often considered the cornerstone of global trade, by dealing with the critical challenge of master stowage planning. The main objective is to maximize cargo revenue and minimize operational costs while navigating demand uncertainty and various complex operational constraints, namely vessel capacity and stability, which must be dynamically updated along the vessel's voyage. To address this problem, we implement a deep reinforcement learning framework with feasibility projection to solve the master stowage planning problem (MPP) under demand uncertainty. The experimental results show that our architecture efficiently finds adaptive, feasible solutions for this multi-stage stochastic optimization problem, outperforming traditional mixed-integer programming and RL with feasibility regularization. Our AI-driven decision-support policy enables adaptive and feasible planning under uncertainty, optimizing operational efficiency and capacity utilization while contributing to sustainable and resilient global supply chains.

constraint, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2502.12756

Country:

Europe (1.00)
North America (0.67)

Genre: Research Report > New Finding (0.87)

Industry:

Transportation > Marine (1.00)
Transportation > Freight & Logistics Services > Shipping > Container Ship (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Mitigating optimistic bias in entropic risk estimation and optimization with an application to insurance

Sadana, Utsav, Delage, Erick, Georghiou, Angelos

arXiv.org Machine LearningDec-25-2024

The entropic risk measure is widely used in high-stakes decision making to account for tail risks associated with an uncertain loss. With limited data, the empirical entropic risk estimator, i.e. replacing the expectation in the entropic risk measure with a sample average, underestimates the true risk. To mitigate the bias in the empirical entropic risk estimator, we propose a strongly asymptotically consistent bootstrapping procedure. The first step of the procedure involves fitting a distribution to the data, whereas the second step estimates the bias of the empirical entropic risk estimator using bootstrapping, and corrects for it. Two methods are proposed to fit a Gaussian Mixture Model to the data, a computationally intensive one that fits the distribution of empirical entropic risk, and a simpler one with a component that fits the tail of the empirical distribution. As an application of our approach, we study distributionally robust entropic risk minimization problems with type-$\infty$ Wasserstein ambiguity set and apply our bias correction to debias validation performance. Furthermore, we propose a distributionally robust optimization model for an insurance contract design problem that takes into account the correlations of losses across households. We show that choosing regularization parameters based on the cross validation methods can result in significantly higher out-of-sample risk for the insurer if the bias in validation performance is not corrected for. This improvement in performance can be explained from the observation that our methods suggest a higher (and more accurate) premium to homeowners.

artificial intelligence, exp, machine learning, (18 more...)

arXiv.org Machine Learning

2409.19926

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.92)

Industry: Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.74)

Add feedback

Fair Resource Allocation in Weakly Coupled Markov Decision Processes

Tu, Xiaohui, Adulyasak, Yossiri, Akbarzadeh, Nima, Delage, Erick

arXiv.org Artificial IntelligenceNov-14-2024

We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where resource constraints couple the action spaces of $N$ sub-Markov decision processes (sub-MDPs) that would otherwise operate independently. We adopt a fairness definition using the generalized Gini function instead of the traditional utilitarian (total-sum) objective. After introducing a general but computationally prohibitive solution scheme based on linear programming, we focus on the homogeneous case where all sub-MDPs are identical. For this case, we show for the first time that the problem reduces to optimizing the utilitarian objective over the class of "permutation invariant" policies. This result is particularly useful as we can exploit Whittle index policies in the restless bandits setting while, for the more general setting, we introduce a count-proportion-based deep reinforcement learning approach. Finally, we validate our theoretical findings with comprehensive experiments, confirming the effectiveness of our proposed method in achieving fairness.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2411.09804

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.81)

Add feedback

Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Hau, Jia Lin, Delage, Erick, Derman, Esther, Ghavamzadeh, Mohammad, Petrik, Marek

arXiv.org Artificial IntelligenceOct-31-2024

In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2410.24128

Country:

North America > United States (0.28)
North America > Canada > Quebec (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem

Akbarzadeh, Nima, Delage, Erick, Adulyasak, Yossiri

arXiv.org Artificial IntelligenceOct-30-2024

In restless multi-arm bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with each arm being a Markov decision process. In this work, we generalize the traditional restless multi-arm bandit problem with a risk-neutral objective by incorporating risk-awareness. We establish indexability conditions for the case of a risk-aware objective and provide a solution based on Whittle index. In addition, we address the learning problem when the true transition probabilities are unknown by proposing a Thompson sampling approach and show that it achieves bounded regret that scales sublinearly with the number of episodes and quadratically with the number of arms. The efficacy of our method in reducing risk exposure in restless multi-arm bandits is illustrated through a set of numerical experiments.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.23029

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Health Care Providers & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
Information Technology > Artificial Intelligence > Robots (0.92)
Information Technology > Data Science > Data Mining > Big Data (0.70)
(2 more...)

Add feedback

End-to-end Conditional Robust Optimization

Chenreddy, Abhilash, Delage, Erick

arXiv.org Artificial IntelligenceMar-7-2024

The field of Contextual Optimization (CO) integrates machine learning and optimization to solve decision making problems under uncertainty. Recently, a risk sensitive variant of CO, known as Conditional Robust Optimization (CRO), combines uncertainty quantification with robust optimization in order to promote safety and reliability in high stake applications. Exploiting modern differentiable optimization methods, we propose a novel end-to-end approach to train a CRO model in a way that accounts for both the empirical risk of the prescribed decisions and the quality of conditional coverage of the contextual uncertainty set that supports them. While guarantees of success for the latter objective are impossible to obtain from the point of view of conformal prediction theory, high quality conditional coverage is achieved empirically by ingeniously employing a logistic regression differentiable layer within the calculation of coverage quality in our training loss. We show that the proposed training algorithms produce decisions that outperform the traditional estimate then optimize approaches.

artificial intelligence, machine learning, optimization, (16 more...)

arXiv.org Artificial Intelligence

2403.0467

Country: North America > Canada (0.14)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Industry: Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

Hau, Jia Lin, Delage, Erick, Ghavamzadeh, Mohammad, Petrik, Marek

arXiv.org Artificial IntelligenceDec-13-2023

Optimizing static risk-averse objectives in Markov decision processes is difficult because they do not admit standard dynamic programming equations common in Reinforcement Learning (RL) algorithms. Dynamic programming decompositions that augment the state space with discrete risk levels have recently gained popularity in the RL community. Prior work has shown that these decompositions are optimal when the risk level is discretized sufficiently. However, we show that these popular decompositions for Conditional-Value-at-Risk (CVaR) and Entropic-Value-at-Risk (EVaR) are inherently suboptimal regardless of the discretization level. In particular, we show that a saddle point property assumed to hold in prior literature may be violated. However, a decomposition does hold for Value-at-Risk and our proof demonstrates how this risk measure differs from CVaR and EVaR. Our findings are significant because risk-averse algorithms are used in high-stakes environments, making their correctness much more critical.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2304.12477

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

Add feedback

A Survey of Contextual Optimization Methods for Decision Making under Uncertainty

Sadana, Utsav, Chenreddy, Abhilash, Delage, Erick, Forel, Alexandre, Frejinger, Emma, Vidal, Thibaut

arXiv.org Artificial IntelligenceJun-17-2023

Recently there has been a surge of interest in operations research (OR) and the machine learning (ML) community in combining prediction algorithms and optimization techniques to solve decision-making problems in the face of uncertainty. This gave rise to the field of contextual optimization, under which data-driven procedures are developed to prescribe actions to the decision-maker that make the best use of the most recently updated information. A large variety of models and methods have been presented in both OR and ML literature under a variety of names, including data-driven optimization, prescriptive optimization, predictive stochastic programming, policy optimization, (smart) predict/estimate-then-optimize, decision-focused learning, (task-based) end-to-end learning/forecasting/optimization, etc. Focusing on single and two-stage stochastic programming problems, this review article identifies three main frameworks for learning policies from data and discusses their strengths and limitations. We present the existing models and methods under a uniform notation and terminology and classify them according to the three main frameworks identified. Our objective with this survey is to both strengthen the general understanding of this active field of research and stimulate further theoretical and algorithmic advancements in integrating ML and stochastic programming.

artificial intelligence, machine learning, optimization, (15 more...)

arXiv.org Artificial Intelligence

2306.10374

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Overview (1.00)

Industry:

Energy > Power Industry (0.92)
Transportation (0.67)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Robust Data-driven Prescriptiveness Optimization

Poursoltani, Mehran, Delage, Erick, Georghiou, Angelos

arXiv.org Artificial IntelligenceJun-9-2023

The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information to provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.

artificial intelligence, coefficient, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.05937

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.64)

Industry:

Energy (0.46)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback