Shi, Haosen
Learning Compact Neural Networks with Deep Overparameterised Multitask Learning
Ren, Shen, Shi, Haosen
The left and right singular vectors are trained with all task losses, and the diagonal matrices are trained using taskspecific Compact neural network offers many benefits for losses. Our design is mainly inspired by analytical real-world applications. However, it is usually studies on overparameterised networks for MTL [Lampinen challenging to train the compact neural networks and Ganguli, 2018] that the training/test error dynamics depends with small parameter sizes and low computational on the time-evolving alignment of the network parameters costs to achieve the same or better model performance to the singular vectors of the training data, and a quantifiable compared to more complex and powerful task alignment describing the transfer benefits among architecture. This is particularly true for multitask multiple tasks depends on the singular values and input feature learning, with different tasks competing for resources.
An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning
Xiao, Changnan, Shi, Haosen, Fan, Jiajun, Deng, Shihong
Policy-based reinforcement learning methods suffer from the policy collapse problem. We find valued-based reinforcement learning methods with {\epsilon}-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem. However, there does not exist a parallel mechanism for policy-based methods that achieves all three characteristics. In this paper, we propose an entropy regularization free mechanism that is designed for policy-based methods, which achieves Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off. Our experiments show that our mechanism is super sample-efficient for policy-based methods and boosts a policy-based baseline to a new State-Of-The-Art on Arcade Learning Environment.
CASA-B: A Unified Framework of Model-Free Reinforcement Learning
Xiao, Changnan, Shi, Haosen, Fan, Jiajun, Deng, Shihong
Building on the breakthrough of reinforcement learning, this paper introduces a unified framework of model-free reinforcement learning, CASA-B, Critic AS an Actor with Bandits Vote Algorithm. CASA-B is an actor-critic framework that estimates state-value, state-action-value and policy. An expectation-correct Doubly Robust Trace is introduced to learn state-value and state-action-value, whose convergence properties are guaranteed. We prove that CASA-B integrates a consistent path for the policy evaluation and the policy improvement. The policy evaluation is equivalent to a compensational policy improvement, which alleviates the function approximation error, and is also equivalent to an entropy-regularized policy improvement, which prevents the policy from collapsing to a suboptimal solution. Building on this design, we find the entropy of the behavior policies' and the target policy's are disentangled. Based on this observation, we propose a progressive closed-form entropy control mechanism, which explicitly controls the behavior policies' entropy to arbitrary range. Our experiments show that CASAB is super sample efficient and achieves State-Of-The-Art on Arcade Learning Environment. Our mean Human Normalized Score is 6456.63% and our median Human Normalized Score is 477.17%, under 200M training scale.