Markov Models
Self-supervised diffusion model fine-tuning for costate initialization using Markov chain Monte Carlo
Graebner, Jannik, Beeson, Ryne
Global search and optimization of long-duration, low-thrust spacecraft trajectories with the indirect method is challenging due to a complex solution space and the difficulty of generating good initial guesses for the costate variables. This is particularly true in multibody environments. Given data that reveals a partial Pareto optimal front, it is desirable to find a flexible manner in which the Pareto front can be completed and fronts for related trajectory problems can be found. In this work we use conditional diffusion models to represent the distribution of candidate optimal trajectory solutions. We then introduce into this framework the novel approach of using Markov Chain Monte Carlo algorithms with self-supervised fine-tuning to achieve the aforementioned goals. Specifically, a random walk Metropolis algorithm is employed to propose new data that can be used to fine-tune the diffusion model using a reward-weighted training based on efficient evaluations of constraint violations and missions objective functions. The framework removes the need for separate focused and often tedious data generation phases. Numerical experiments are presented for two problems demonstrating the ability to improve sample quality and explicitly target Pareto optimality based on the theory of Markov chains. The first problem does so for a transfer in the Jupiter-Europa circular restricted three-body problem, where the MCMC approach completes a partial Pareto front. The second problem demonstrates how a dense and superior Pareto front can be generated by the MCMC self-supervised fine-tuning method for a Saturn-Titan transfer starting from the Jupiter-Europa case versus a separate dedicated global search.
Calibration of Shared Equilibria in General Sum Partially Observable Markov Games
This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets. Parameter sharing with decentralized execution has been introduced as an efficient way to train multiple agents using a single policy network.
32b30a250abd6331e03a2a1f16466346-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper proposes an estimation strategy for recovering the parameters of a finite state Markov chain given observed stationary frequencies of some states. Although the problem proposed is potentially very interesting, the paper does not appear to be in a mature state. Some fundamental issues are not adequately addressed, while the evaluation is limited, and the writing quality is not strong. Note that there is an uncountable set of ergodic transition models that can exactly match a given subset of stationary frequencies when the number of observed stationary state frequencies is small relative to the total number of states.