A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward
Murphy, S. A., Deng, Y., Laber, E. B., Maei, H. R., Sutton, R. S., Witkiewitz, K.
We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view toward its use in mobile health. In the behavioral health communities there is increasing interest in, and use of, mobile devices to deliver treatments that target behavior change. Mobile devices can be used to provide treatment when, where, and in the amount desired (Litvin et al., 2013; Kumar et al., 2013). Increasingly scientists are looking to passive sensing (wearable devices, GPS, activity on the smartphone) and self-report of internal states to individualize the intervention to the person in terms of when, how and where to deliver treatment.
Jul-18-2016
- Country:
- North America > United States (0.93)
- Genre:
- Research Report
- Experimental Study (1.00)
- Strength High (0.68)
- Research Report
- Industry:
- Technology: