Sequential decision making is a typical problem in reinforcement learning with plenty of algorithms to solve it. However, only a few of them can work effectively with a very small number of observations. In this report, we introduce the progress to learn the policy for Malaria Control as a Reinforcement Learning problem in the KDD Cup Challenge 2019 and propose diverse solutions to deal with the limited observations problem. We apply the Genetic Algorithm, Bayesian Optimization, Q-learning with sequence breaking to find the optimal policy for five years in a row with only 20 episodes/100 evaluations. We evaluate those algorithms and compare their performance with Random Search as a baseline. Among these algorithms, Q-Learning with sequence breaking has been submitted to the challenge and got ranked 7th in KDD Cup.
Epidemiology simulations have become a fundamental tool in the fight against the epidemics of various infectious diseases like AIDS and malaria. However, the complicated and stochastic nature of these simulators can mean their output is difficult to interpret, which reduces their usefulness to policymakers. In this paper, we introduce an approach that allows one to treat a large class of population-based epidemiology simulators as probabilistic generative models. This is achieved by hijacking the internal random number generator calls, through the use of a universal probabilistic programming system (PPS). In contrast to other methods, our approach can be easily retrofitted to simulators written in popular industrial programming frameworks. We demonstrate that our method can be used for interpretable introspection and inference, thus shedding light on black-box simulators. This reinstates much-needed trust between policymakers and evidence-based methods.
Large-scale computational experiments, often running over weeks and over large datasets, are used extensively in fields such as epidemiology, meteorology, computational biology, and healthcare to understand phenomena, and design high-stakes policies affecting everyday health and economy. For instance, the OpenMalaria framework is a computationally-intensive simulation used by various nongovernmental and governmental agencies to understand malarial disease spread and effectiveness of intervention strategies, and subsequently design healthcare policies. Given that such shared results form the basis of inferences drawn, technological solutions designed, and day-today policies drafted, it is essential that the computations are validated and trusted. In particular, in a multi-agent environment involving several independent computing agents, a notion of trust in results generated by peers is critical in facilitating transparency, accountability, and collaboration. Using a novel combination of distributed validation of atomic computation blocks and a blockchain-based immutable audits mechanism, this work proposes a universal framework for distributed trust in computations. In particular we address the scalaibility problem by reducing the storage and communication costs using a lossy compression scheme. This framework guarantees not only verifiability of final results, but also the validity of local computations, and its cost-benefit tradeoffs are studied using a synthetic example of training a neural network. Machine learning, data science, and large-scale computations in general has created an era of computationdriven inference, applications, and policymaking , . Technological solutions, and policies with far-reaching consequences are increasingly being derived from computational frameworks and data. Multi-agent sociotechnical systems that are tasked with working collaboratively on such tasks function by interactively sharing data, models, and results of local computation. However, when such agents are independent and lack trust, they might not collaborate with or trust the validity of reported computations of other agents. Quite often, these computations are also expensive and time consuming, and thus infeasible for recomputation by the doubting peer as a general course of action.
The task of decision-making under uncertainty is daunting, especially for problems which have significant complexity. Healthcare policy makers across the globe are facing problems under challenging constraints, with limited tools to help them make data driven decisions. In this work we frame the process of finding an optimal malaria policy as a stochastic multi-armed bandit problem, and implement three agent based strategies to explore the policy space. We apply a Gaussian Process regression to the findings of each agent, both for comparison and to account for stochastic results from simulating the spread of malaria in a fixed population. The generated policy spaces are compared with published results to give a direct reference with human expert decisions for the same simulated population. Our novel approach provides a powerful resource for policy makers, and a platform which can be readily extended to capture future more nuanced policy spaces.
We build a deep reinforcement learning (RL) agent that can predict the likelihood of an individual testing positive for malaria by asking questions about their household. The RL agent learns to determine which survey question to ask next and when to stop to make a prediction about their likelihood of malaria based on their responses hitherto. The agent incurs a small penalty for each question asked, and a large reward/penalty for making the correct/wrong prediction; it thus has to learn to balance the length of the survey with the accuracy of its final predictions. Our RL agent is a Deep Q-network that learns a policy directly from the responses to the questions, with an action defined for each possible survey question and for each possible prediction class. We focus on Kenya, where malaria is a massive health burden, and train the RL agent on a dataset of 6481 households from the Kenya Malaria Indicator Survey 2015. To investigate the importance of having survey questions be adaptive to responses, we compare our RL agent to a supervised learning (SL) baseline that fixes its set of survey questions a priori. We evaluate on prediction accuracy and on the number of survey questions asked on a holdout set and find that the RL agent is able to predict with 80% accuracy, using only 2.5 questions on average. In addition, the RL agent learns to survey adaptively to responses and is able to match the SL baseline in prediction accuracy while significantly reducing survey length.