ensemble method
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States (0.04)
- Europe > France (0.04)
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.06)
- Asia > Middle East > Jordan (0.04)
- (3 more...)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
SupplementaryMaterial: Appendix BayesianDeepEnsemblesviatheNeuralTangentKernel ARecapofstandardandNTKparameterisations
We see that the different parameterisations yield the same distribution for the functional output f(,θ)atinitialisation, butgivedifferent scalings tothe parameter gradients inthe backward pass. GP(0,Θ L) and is independent off0() in the infinite width limit. Let X0 be an arbitrary test set. In fact, even with a heteroscedastic priorθ N(0,Λ) with a diagonal matrix Λ Rp p+ and diagonal entries {λj}pj=1, it is straightforward to show that the correct setting of regularisation iskθk2Λ = θ>Λ 1θ in order to obtain a posterior sample of θ. For an NN in the linearised regime [23], this is related to the fact that the NTK and standard parameterisations initialise parameters differently, yet yield the same functional distribution for a randomly initialised NN.
Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
The ensemble method is a promising way to mitigate the overestimation issue in Q-learning, where multiple function approximators are used to estimate the action values. It is known that the estimation bias hinges heavily on the ensemble size (i.e., the number of Q-function approximators used in the target), and that determining the'right' ensemble size is highly nontrivial, because of the time-varying nature of the function approximation errors during the learning process. To tackle this challenge, we first derive an upper bound and a lower bound on the estimation bias, based on which the ensemble size is adapted to drive the bias to be nearly zero, thereby coping with the impact of the time-varying approximation errors accordingly. Motivated by the theoretic findings, we advocate that the ensemble method can be combined with Model Identification Adaptive Control (MIAC) for effective ensemble size adaptation. Specifically, we devise Adaptive Ensemble Q-learning (AdaEQ), a generalized ensemble method with two key steps: (a) approximation error characterization which serves as the feedback for flexibly controlling the ensemble size, and (b) ensemble size adaptation tailored towards minimizing the estimation bias. Extensive experiments are carried out to show that AdaEQ can improve the learning performance than the existing methods for the MuJoCo benchmark.