Supplementary Material for Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble AEnsemble gradient diversification

Apr-25-2026, 13:16:29 GMT–Neural Information Processing Systems

Proposition 1. Suppose Qφj(s,a) = Q(s,a) and Qφj(s,) is locally linear in the neighborhood of a for all j [N]. Let λmin and wmin be the smallest eigenvalue and the corresponding normalized eigenvector of the matrix Var aQφj(s,a) and > 0 be the value such that mini6=j aQφi(s,a), aQφj(s,a) = 1 . We first prove that the smallest eigenvalue λmin of Var aQφj(s,a) is upper-bounded by some constant multiple of . By Lemma 1, the total variance of the matrix is less or equal to N 1N. Note that, using the fact that the Q-values coincide at the action a and the local linearity of the Q-functions, we have derived Var(Qφj(s,a+ kw)) = k2w|Var aQφj(s,a) w. (2) Plugging w = wmin in Equation (2) and using Equation (1), we have Var(Qφj(s,a+ kwmin)) = k2w|minVar aQφj(s,a) wmin = k2λmin A.2 Relationship between maximizing the total variance and maximizing the smallest eigenvalue As we have shown in Section 4, maximizing the total variance of the matrix Var ( aQφi(s,a)) is equivalent to minimizing the cosine similarity of all distinct pairs of the gradients aQφi(s,a), 2 which makes the gradients uniformly distributed on the unit sphere S|A| 1.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Apr-25-2026, 13:16:29 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Duplicate Docs Excel Report

Title
3d3d286a8d153a4a58156d0e02d8570c-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found