Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning

Open in new window