Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management

Saifullah, M., Papakonstantinou, K. G., Andriotis, C. P., Stoffels, S. M.

arXiv.org Artificial Intelligence 

Optimal management of cross-asset infrastructure is a complex problem that requires adept inspection and maintenance policies addressing stochastic degradation impacts. According to the 2021 ASCE infrastructure report card [1], the US infrastructure is in fair to poor condition, earning a cumulative grade of C-, with components nearing the end of their useful lives and at high risk of failure. Pavements and bridges are indicative examples of inadequate infrastructure. One in every five miles of pavements is in poor condition, and 7.5% of bridges are structurally deficient. Economic analyses indicate that the US Department of Transportation fell 50% short of the funds required to sustain the national transportation system [1], which is also reflected in the available resources at individual State transportation agencies. The Virginia Department of Transportation, for example, reported that 50% of the State's bridges have exceeded their useful lives, and the required funds to replace them are five times greater than the estimated available funds over the next fifty years [2]. Inspection and Maintenance (I&M) policies are therefore indispensable towards efficiently distributing available economic and environmental resources for transportation systems. Making optimal decisions in complex and uncertain environments presents a variety of difficulties, including heterogeneity of asset classes, a high number of components resulting in vast state and action spaces, unreliable observations, limited availability of resources, and several related risks. Optimal solutions that define inspection and maintenance policies should thus incorporate concepts such as (i) online and offline data learning, (ii) imperfect information support, (iii) stochastic action outcomes considerations, and (iv) optimization of long-term goals under multiple constraints (e.g., safety targets or resource constraints).