Goto

Collaborating Authors

 Shani, Guy


Reinforcement Learning on AYA Dyads to Enhance Medication Adherence

arXiv.org Artificial Intelligence

Medication adherence is critical for the recovery of adolescents and young adults (AYAs) who have undergone hematopoietic cell transplantation (HCT). However, maintaining adherence is challenging for AYAs after hospital discharge, who experience both individual (e.g. physical and emotional symptoms) and interpersonal barriers (e.g., relational difficulties with their care partner, who is often involved in medication management). To optimize the effectiveness of a three-component digital intervention targeting both members of the dyad as well as their relationship, we propose a novel Multi-Agent Reinforcement Learning (MARL) approach to personalize the delivery of interventions. By incorporating the domain knowledge, the MARL framework, where each agent is responsible for the delivery of one intervention component, allows for faster learning compared with a flattened agent. Evaluation using a dyadic simulator environment, based on real clinical data, shows a significant improvement in medication adherence (approximately 3%) compared to purely random intervention delivery. The effectiveness of this approach will be further evaluated in an upcoming trial.


Heuristics for Partially Observable Stochastic Contingent Planning

arXiv.org Artificial Intelligence

Acting to complete tasks in stochastic partially observable domains is an important problem in artificial intelligence, and is often formulated as a goal-based POMDP. Goal-based POMDPs can be solved using the RTDP-BEL algorithm, that operates by running forward trajectories from the initial belief to the goal. These trajectories can be guided by a heuristic, and more accurate heuristics can result in significantly faster convergence. In this paper, we develop a heuristic function that leverages the structured representation of domain models. We compute, in a relaxed space, a plan to achieve the goal, while taking into account the value of information, as well as the stochastic effects. We provide experiments showing that while our heuristic is slower to compute, it requires an order of magnitude less trajectories before convergence. Overall, it thus speeds up RTDP-BEL, particularly in problems where significant information gathering is needed.


Dyadic Reinforcement Learning

arXiv.org Machine Learning

Mobile health aims to enhance health outcomes by delivering interventions to individuals as they go about their daily life. The involvement of care partners and social support networks often proves crucial in helping individuals managing burdensome medical conditions. This presents opportunities in mobile health to design interventions that target the dyadic relationship -- the relationship between a target person and their care partner -- with the aim of enhancing social support. In this paper, we develop dyadic RL, an online reinforcement learning algorithm designed to personalize intervention delivery based on contextual factors and past responses of a target person and their care partner. Here, multiple sets of interventions impact the dyad across multiple time intervals. The developed dyadic RL is Bayesian and hierarchical. We formally introduce the problem setup, develop dyadic RL and establish a regret bound. We demonstrate dyadic RL's empirical performance through simulation studies on both toy scenarios and on a realistic test bed constructed from data collected in a mobile health study.


Rollout Heuristics for Online Stochastic Contingent Planning

arXiv.org Artificial Intelligence

Partially observable Markov decision processes (POMDP) are a useful model for decision-making under partial observability and stochastic actions. Partially Observable Monte-Carlo Planning is an online algorithm for deciding on the next action to perform, using a Monte-Carlo tree search approach, based on the UCT (UCB applied to trees) algorithm for fully observable Markov-decision processes. POMCP develops an action-observation tree, and at the leaves, uses a rollout policy to provide a value estimate for the leaf. As such, POMCP is highly dependent on the rollout policy to compute good estimates, and hence identify good actions. Thus, many practitioners who use POMCP are required to create strong, domain-specific heuristics. In this paper, we model POMDPs as stochastic contingent planning problems. This allows us to leverage domain-independent heuristics that were developed in the planning community. We suggest two heuristics, the first is based on the well-known h_add heuristic from classical planning, and the second is computed in belief space, taking the value of information into account.


Partial Disclosure of Private Dependencies in Privacy Preserving Planning

arXiv.org Artificial Intelligence

In collaborative privacy preserving planning (CPPP), a group of agents jointly creates a plan to achieve a set of goals while preserving each others' privacy. During planning, agents often reveal the private dependencies between their public actions to other agents, that is, which public action facilitates the preconditions of another public action. Previous work in CPPP does not limit the disclosure of such dependencies. In this paper, we explicitly limit the amount of disclosed dependencies, allowing agents to publish only a part of their private dependencies. We investigate different strategies for deciding which dependencies to publish, and how they affect the ability to find solutions. We evaluate the ability of two solvers -- distribute forward search and centralized planning based on a single-agent projection -- to produce plans under this constraint. Experiments over standard CPPP domains show that the proposed dependency-sharing strategies enable generating plans while sharing only a small fraction of all private dependencies.


A difficulty ranking approach to personalization in E-learning

arXiv.org Artificial Intelligence

The prevalence of e-learning systems and on-line courses has made educational material widely accessible to students of varying abilities and backgrounds. There is thus a growing need to accommodate for individual differences in e-learning systems. This paper presents an algorithm called EduRank for personalizing educational content to students that combines a collaborative filtering algorithm with voting methods. EduRank constructs a difficulty ranking for each student by aggregating the rankings of similar students using different aspects of their performance on common questions. These aspects include grades, number of retries, and time spent solving questions. It infers a difficulty ranking directly over the questions for each student, rather than ordering them according to the student's predicted score. The EduRank algorithm was tested on two data sets containing thousands of students and a million records. It was able to outperform the state-of-the-art ranking approaches as well as a domain expert. EduRank was used by students in a classroom activity, where a prior model was incorporated to predict the difficulty rankings of students with no prior history in the system. It was shown to lead students to solve more difficult questions than an ordering by a domain expert, without reducing their performance.


MDP-Based Cost Sensitive Classification Using Decision Trees

AAAI Conferences

In classification, an algorithm learns to classify a given instance based on a set of observed attribute values. In many real world cases testing the value of an attribute incurs a cost. Furthermore, there can also be a cost associated with the misclassification of an instance. Cost sensitive classification attempts to minimize the expected cost of classification, by deciding after each observed attribute value, which attribute to measure next. In this paper we suggest Markov Decision Processes as a modeling tool for cost sensitive classification. We construct standard decision trees over all attribute subsets, and the leaves of these trees become the state space of our MDP. At each phase we decide on the next attribute to measure, balancing the cost of the measurement and the classification accuracy. We compare our approach to a set of previous approaches, showing our approach to work better for a range of misclassification costs.


Increased Privacy with Reduced Communication in Multi-Agent Planning

AAAI Conferences

Multi-agent forward search (MAFS) is a state-of-the-art privacy-preserving planning algorithm. We describe a new variant of MAFS, called multi-agent forward-backward search (MAFBS) that uses both forward and backward messages to reduce the number of messages sent and obtain new privacy properties. While MAFS requires agents to send a state s produced by an action a to all agents that can apply any action in s, MAFBS sends such messages forward only to agents that have an action that requires one of the effects of a. To achieve completeness, it sends messages backward to agents that can supply a missing precondition. This more focused message passing scheme reduces states exchanged, and requires that agents be aware only of other agents that they directly interact with, leading to agent privacy.


Online Macro Generation for Privacy Preserving Planning

AAAI Conferences

Agents that use Multi-Agent Forward Search (MAFS) todo privacy-preserving planning, often repeatedly develop similar paths. We describe a simple technique for online macro generation allowing agents to reuse successful previous action sequences. By focusing on specific sequences that end with a single public action only, we are able to address the utility problem -- our technique has negligible cost, yet provides both speedups and reduced communication in domains where agents have a reasonable amount of private actions. We describe two variants of our approach, both with attractive privacy preserving properties, and demonstrate the value of macros empirically. We also show that one variant is equivalent to secure MAFS.


Stronger Privacy Preserving Projections for Multi-Agent Planning

AAAI Conferences

Collaborative privacy-preserving planning (CPPP) is a multi-agent planning task in which agents need to achieve a common set of goals without revealing certain private information. In many CPPP algorithms the individual agents reason about a projection of the multiagent problem onto a single-agent classical planning problem. For example, an agent can plan as if it controls the public actions of other agents, ignoring their unknown private preconditions and effects, and use the cost of this plan as a heuristic for the cost of the full, multi-agent plan. Using such a projection, however, ignores some dependencies between agents’ public actions. In particular, it does not contain dependencies between actions of other agents caused by their private facts. We propose a projection in which these private dependencies are maintained. The benefit of our dependency-preserving projection is demonstrated by using it to produce high level plans in a new privacy preserving planner that is able to solve more benchmark problems than any other state-of-the-art privacy preserving planner. This more informed projection does not explicitly share private information. In addition, we show that even if an adversary agent knows that an agent has some private objects of a given type (e.g., trucks), it cannot infer how many such private objects the agent controls. This introduces a novel strong form of privacy that is motivated by real-world requirements.