A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

Murphy, S. A., Deng, Y., Laber, E. B., Maei, H. R., Sutton, R. S., Witkiewitz, K.

Jul-18-2016–arXiv.org Machine Learning

We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view toward its use in mobile health. In the behavioral health communities there is increasing interest in, and use of, mobile devices to deliver treatments that target behavior change. Mobile devices can be used to provide treatment when, where, and in the amount desired (Litvin et al., 2013; Kumar et al., 2013). Increasingly scientists are looking to passive sensing (wearable devices, GPS, activity on the smartphone) and self-report of internal states to individualize the intervention to the person in terms of when, how and where to deliver treatment.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

Jul-18-2016

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report
  - Experimental Study (1.00)
  - Strength High (0.68)

Industry:
- Health & Medicine
  - Consumer Health (1.00)
  - Therapeutic Area > Psychiatry/Psychology
    - Addiction Disorder (0.46)

Technology:
- Information Technology
  - Communications > Mobile (1.00)
  - Artificial Intelligence > Machine Learning
    - Reinforcement Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found