Planning with Submodular Objective Functions

Wang, Ruosong, Zhang, Hanrui, Chaplot, Devendra Singh, Garagić, Denis, Salakhutdinov, Ruslan

Oct-22-2020–arXiv.org Artificial Intelligence

Modern reinforcement learning and planning algorithms have achieved tremendous successes on various tasks [Mnih et al., 2015, Silver et al., 2017]. However, most of these algorithms work in the standard Markov decision process (MDP) framework where the goal is to maximize the cumulative reward and thus it can be difficult to apply them to various practical sequential decision-making problems. In this paper, we study planning in generalized MDPs, where instead of maximizing the cumulative reward, the goal is to maximize the objective value induced by a submodular function. To motivate our approach, let us consider the following scenario: a company manufactures cars, and as part of its customer service, continuously monitors the status of all cars produced by the company. Each car is equipped with a number of sensors, each of which constantly produces noisy measurements of some attribute of the car, e.g., speed, location, temperature, etc. Due to bandwidth constraints, at any moment, each car may choose to transmit data generated by a single sensor to the company. The goal is to combine the statistics collected over a fixed period of time, presumably from multiple sensors, to gather as much information about the car as possible. Perhaps one seemingly natural strategy is to transmit only data generated by the most "informative" sensor. However, as the output of a sensor remains the same between two samples, it is pointless to transmit the same data multiple times. One may alternatively try to order sensors by their "informativity" and always choose the most informative sensor that has not yet transmitted data since the last sample was generated.

algorithm, artificial intelligence, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

Oct-22-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report (0.50)
- Workflow (0.46)

Industry:
- Government > Regional Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.89)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning > Search (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found