Search
RandAugment: Practical Automated Data Augmentation with a Reduced Search Space
Recent work on automated data augmentation strategies has led to state-of-the-art results in image classification and object detection. An obstacle to a large-scale adoption of these methods is that they require a separate and expensive search phase. A common way to overcome the expense of the search phase was to use a smaller proxy task.
Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
Many problems in machine learning reduce to learning a probability distribution (or policy) over sequences of discrete actions so as to maximize a downstream utility function. Examples include generating text sequences to maximize a task-specific metric like BLEU and generating action sequences in reinforcement learning (RL) to maximize expected return.
start with common concerns and then respond to individual reviewer comments as space permits: 2 Common: There should be a baseline using MCTS and assuming access to simulator / common random numbers
Thank you for the thoughtful and careful reviews. We hope the AC nominates some of you for reviewer awards. There should be a baseline using MCTS and assuming access to simulator / common random numbers. There appears to be some imprecision in reviews about what this means. Then environment stochasticity is re-sampled and the algorithm repeats.