Kim, Joseph
Investigation of risk-aware MDP and POMDP contingency management autonomy for UAS
Sharma, Prashin, Kraske, Benjamin, Kim, Joseph, Laouar, Zakariya, Sunberg, Zachary, Atkins, Ella
Unmanned aircraft systems (UAS) are being increasingly adopted for various applications. The risk UAS poses to people and property must be kept to acceptable levels. This paper proposes risk-aware contingency management autonomy to prevent an accident in the event of component malfunction, specifically propulsion unit failure and/or battery degradation. The proposed autonomy is modeled as a Markov Decision Process (MDP) whose solution is a contingency management policy that appropriately executes emergency landing, flight termination or continuation of planned flight actions. Motivated by the potential for errors in fault/failure indicators, partial observability of the MDP state space is investigated. The performance of optimal policies is analyzed over varying observability conditions in a high-fidelity simulator. Results indicate that both partially observable MDP (POMDP) and maximum a posteriori MDP policies performed similarly over different state observability criteria, given the nearly deterministic state transition model.
Deep Reinforcement Learning for Cost-Effective Medical Diagnosis
Yu, Zheng, Li, Yikuan, Kim, Joseph, Huang, Kaixuan, Luo, Yuan, Wang, Mengdi
Dynamic diagnosis is desirable when medical tests are costly or time-consuming. In this work, we use reinforcement learning (RL) to find a dynamic policy that selects lab test panels sequentially based on previous observations, ensuring accurate testing at a low cost. Clinical diagnostic data are often highly imbalanced; therefore, we aim to maximize the $F_1$ score instead of the error rate. However, optimizing the non-concave $F_1$ score is not a classic RL problem, thus invalidates standard RL methods. To remedy this issue, we develop a reward shaping approach, leveraging properties of the $F_1$ score and duality of policy optimization, to provably find the set of all Pareto-optimal policies for budget-constrained $F_1$ score maximization. To handle the combinatorially complex state space, we propose a Semi-Model-based Deep Diagnosis Policy Optimization (SM-DDPO) framework that is compatible with end-to-end training and online learning. SM-DDPO is tested on diverse clinical tasks: ferritin abnormality detection, sepsis mortality prediction, and acute kidney injury diagnosis. Experiments with real-world data validate that SM-DDPO trains efficiently and identifies all Pareto-front solutions. Across all tasks, SM-DDPO is able to achieve state-of-the-art diagnosis accuracy (in some cases higher than conventional methods) with up to $85\%$ reduction in testing cost. The code is available at [https://github.com/Zheng321/Deep-Reinforcement-Learning-for-Cost-Effective-Medical-Diagnosis].
Towards Intelligent Decision Support in Human Team Planning
Kim, Joseph (Massachusetts Institute of Technology) | Shah, Julie A. (Massachusetts Institute of Technology)
Inherent human limitations in teaming environments coupled with complex planning problems spur the integration of intelligent decision support (IDS) systems for human-agent planning. However, prior research in human-agent planning has been limited to dyadic interaction between a single human and a single planning agent. In this paper, we highlight an emerging research area of IDS for human team planning, i.e. environments where the agent works with a team of human planners to enhance the quality of their plans and the ease of making them. We review prior works in human-agent planning and identify research challenges for an agent participating in human team planning.
Collaborative Planning with Encoding of Users' High-Level Strategies
Kim, Joseph (Massachusetts Institute of Technology) | Banks, Christopher J. (Norfolk State University) | Shah, Julie A. (Massachusetts Institute of Technology)
The generation of near-optimal plans for multi-agent systems with numerical states and temporal actions is computationally challenging. Current off-the-shelf planners can take a very long time before generating a near-optimal solution. In an effort to reduce plan computation time, increase the quality of the resulting plans, and make them more interpretable by humans, we explore collaborative planning techniques that actively involve human users in plan generation. Specifically, we explore a framework in which users provide high-level strategies encoded as soft preferences to guide the low-level search of the planner. Through human subject experimentation, we empirically demonstrate that this approach results in statistically significant improvements to plan quality, without substantially increasing computation time. We also show that the resulting plans achieve greater similarity to those generated by humans with regard to the produced sequences of actions, as compared to plans that do not incorporate user-provided strategies.