AITopics | effective scheduling

Appendix for On Effective Scheduling of Model based Reinforcement Learning

Neural Information Processing SystemsApr-25-2026, 00:31:26 GMT

We call c(m) the m-step concentrability of a future-state distribution and call Cρ,µ the discountedaverage concentrability coefficient of the future-state distributions. The class of MDPs that satisfies this concentrability assumption is quite large, which is further discussed in Munos and Szepesvári [18]. If Xi, i = 1,...,N is an i.i.d. And when q = 1, N is used instead of N1. From the definition, one can esasily see that Nq,FX1:N N. Lemma A.2. (Single Iteration Error Bound) Let Vk and Vk+1 be the value functions of iteration kand k+1, and Vmax = rmax/(1 γ).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

On Effective Scheduling of Model-based Reinforcement Learning

Neural Information Processing SystemsDec-23-2025, 20:28:47 GMT

Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve adequate performance, such as the real data ratio for policy optimization in Dyna-style model-based algorithms. In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance. Inspired by the analysis, we propose a framework named AutoMBPO to automatically schedule the real data ratio as well as other hyperparameters in training model-based policy optimization (MBPO) algorithm, a representative running case of model-based methods. On several continuous control tasks, the MBPO instance trained with hyperparameters scheduled by AutoMBPO can significantly surpass the original one, and the real data ratio schedule found by AutoMBPO shows consistency with our theoretical analysis.

effective scheduling, model-based reinforcement learning, name change, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling

Oh, Youngmin, Park, Jinje, Kim, Seunggeun, Paik, Taejin, Pan, David, Hwang, Bosun

arXiv.org Artificial IntelligenceNov-24-2024

Recent advancements in reinforcement learning (RL) for analog circuit optimization have demonstrated significant potential for improving sample efficiency and generalization across diverse circuit topologies and target specifications. However, there are challenges such as high computational overhead, the need for bespoke models for each circuit. To address them, we propose M3, a novel Model-based RL (MBRL) method employing the Mamba architecture and effective scheduling. The Mamba architecture, known as a strong alternative to the transformer architecture, enables multi-circuit optimization with distinct parameters and target specifications. The effective scheduling strategy enhances sample efficiency by adjusting crucial MBRL training parameters. To the best of our knowledge, M3 is the first method for multi-circuit optimization by leveraging both the Mamba architecture and a MBRL with effective scheduling. As a result, it significantly improves sample efficiency compared to existing RL methods.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.16019

Country:

Europe (0.04)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

On Effective Scheduling of Model-based Reinforcement Learning

Neural Information Processing SystemsOct-9-2024, 17:21:56 GMT

Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve adequate performance, such as the real data ratio for policy optimization in Dyna-style model-based algorithms. In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance. Inspired by the analysis, we propose a framework named AutoMBPO to automatically schedule the real data ratio as well as other hyperparameters in training model-based policy optimization (MBPO) algorithm, a representative running case of model-based methods. On several continuous control tasks, the MBPO instance trained with hyperparameters scheduled by AutoMBPO can significantly surpass the original one, and the real data ratio schedule found by AutoMBPO shows consistency with our theoretical analysis.

effective scheduling, hyperparameter, model-based reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Filters

Collaborating Authors

effective scheduling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Appendix for On Effective Scheduling of Model based Reinforcement Learning

On Effective Scheduling of Model-based Reinforcement Learning

M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling

On Effective Scheduling of Model-based Reinforcement Learning