Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs

Feb-24-2023–arXiv.org Artificial Intelligence

In many real-world reinforcement learning (RL) tasks, limited computing resources make it challenging to apply fully adaptive algorithms that continually update the exploration policy. As a surrogate, it is more cost-effective to collect data in large batches using the current policy and make changes to the policy after the entire batch is completed. For example, in a recommendation system [Afsar et al., 2021], it is easier to gather new data quickly, but deploying a new policy takes longer as it requires significant computing and human resources. Therefore, it's not feasible to switch policies based on real-time data, as typical RL algorithms would require. A practical solution is to run several experiments in parallel and make decisions on policy updates only after the entire batch has been completed. Similar limitations occur in other RL based applications such as healthcare [Yu et al., 2021], robotics [Kober et al., 2013], and new material design [Zhou et al., 2019], where the agent must minimize the number of policy updates while still learning an effective policy using a similar number of trajectories as fully-adaptive methods. On the theoretical side, Bai et al. [2019] brought up the definition of switching cost, which measures the number of policy updates.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Feb-24-2023

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.64)

Industry:
- Health & Medicine (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found