dvrl
How Reinforcement Learning Can Help In Data Valuation
It is well established that machine learning models perform better with well-curated large scale data. However, collecting and curating is one of the biggest challenges right now. There are billion-dollar companies like Scale.ai who set up their shop with the sole purpose to annotate data. The whole data collection process is so tedious that it has become profitable for few. But, we are still talking about what happens before data arrives at an ML pipeline.
Data Valuation using Reinforcement Learning
Yoon, Jinsung, Arik, Sercan O., Pfister, Tomas
Quantifying the value of data is a fundamental problem in machine learning. Data valuation has multiple important use cases: (1) building insights about the learning task, (2) domain adaptation, (3) corrupted sample discovery, and (4) robust learning. To adaptively learn data values jointly with the target task predictor model, we propose a meta learning framework which we name Data Valuation using Reinforcement Learning (DVRL). We employ a data value estimator (modeled by a deep neural network) to learn how likely each datum is used in training of the predictor model. We train the data value estimator using a reinforcement signal of the reward obtained on a small validation set that reflects performance on the target task. We demonstrate that DVRL yields superior data value estimates compared to alternative methods across different types of datasets and in a diverse set of application scenarios. The corrupted sample discovery performance of DVRL is close to optimal in many regimes (i.e. as if the noisy samples were known apriori), and for domain adaptation and robust learning DVRL significantly outperforms state-of-the-art by 14.6% and 10.8%, respectively.
Deep Variational Reinforcement Learning for POMDPs
Igl, Maximilian, Zintgraf, Luisa, Le, Tuan Anh, Wood, Frank, Whiteson, Shimon
Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown. Consequently, there is great need for reinforcement learning methods that can tackle such problems given only a stream of incomplete and noisy observations. In this paper, we propose deep variational reinforcement learning (DVRL), which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information. We develop an n-step approximation to the evidence lower bound (ELBO), allowing the model to be trained jointly with the policy. This ensures that the latent state representation is suitable for the control task. In experiments on Mountain Hike and flickering Atari we show that our method outperforms previous approaches relying on recurrent neural networks to encode the past.