q-function
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee
We propose stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue. By dynamically interpolating between model rollouts of various horizon lengths for each individual example, STEVE ensures that the model is only utilized when doing so does not introduce significant errors.
Country:
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > Middle East > Jordan (0.04)
Technology:
Country:
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Texas (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Country:
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Denmark (0.04)
- Asia > Middle East > Jordan (0.04)
Industry:
- Health & Medicine (0.67)
- Information Technology > Security & Privacy (0.45)
Technology:
Country:
- North America > United States (0.14)
- Europe > Russia (0.04)
- Europe > Hungary (0.04)
- (8 more...)
Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Country:
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
However, existing offline RL methods tend to behave poorly during fine-tuning. In this paper, we study the fine-tuning problem in the context of conservative offline RL methods and we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities.
Country:
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Montana (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Technology:
Country:
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- Europe > Germany (0.04)
Industry:
- Transportation (0.68)
- Automobiles & Trucks (0.46)
Technology:
Country:
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
Genre:
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Genre:
- Research Report > New Finding (0.67)
- Overview (0.67)
Technology: