lse
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Austria > Upper Austria > Linz (0.04)
- (6 more...)
- Information Technology (0.67)
- Health & Medicine (0.67)
- Education > Educational Setting > Online (0.45)
SupplementaryMaterials AProofofTheorem2: AsymptoticConvergenceofRobustQ-Learning
From[BorkarandMeyn,2000],weknowthatthestochastic approximation (18) converges to the fixed point ofT, i.e., Q . Finally, to show Theorem 3, we only need to show each term in(56) is smaller than . In this section we develop the finite-time analysis of the robust TDC algorithm. We note that recently there are several works [Srikant and Ying, 2019, Xu and Liang, 2021, Kaledin et al., 2020] on finite-time analysis of RL algorithms that do not need theprojection. Specifically, the problem in [Srikant and Ying, 2019] is for one time scalelinear stochastic approximation.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Austria > Upper Austria > Linz (0.04)
- (6 more...)
- Information Technology (0.67)
- Health & Medicine (0.67)
- Education > Educational Setting > Online (0.45)
A Equivalence of G-B
In our notation, the model in Dasgupta et al. [4] would have score function F As presented in Dasgupta et al. Although the model was proposed and analyzed in Dasgupta et al. Remark 2. Note that the mean of The proof is by direct calculation. The following lemma will be helpful in proving the next part. Given a threshold T, temperature hyperparameters,, there exists and a bijection on the set of parameterizations {V!
Supplementary Materials A Proof of Theorem 2: Asymptotic Convergence of Robust Q-Learning
V null, (15) which is the expectation of the estimated update in line 5 of Algorithm 1. A.1 Robust Bellman operator is a contraction It was shown in [Iyengar, 2005, Roy et al., 2017] that the robust Bellman operator is a contraction. Here, for completeness, we include the proof for our R-contamination uncertainty set. In this section, we develop the finite-time analysis of the Algorithm 1. B.1 Notations We first introduce some notations. D. (44) Hence from the Bernstein inequality ([Li et al., 2020]), we have that |k This hence completes the proof.Lemma 4. F or any t T, |k In this section we prove Theorem 4. First note that for any x,y R In this section we develop the finite-time analysis of the robust TDC algorithm. For the convenience of proof, we add a projection step to the algorithm, i.e., we let θ The approach in [Kaledin et al., 2020] transforms the D.1 Lipschitz Smoothness In this section, we first show that J (θ) is Lipschitz.
Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning
Behnamnia, Armin, Aminian, Gholamali, Aghaei, Alireza, Shi, Chengchun, Tan, Vincent Y. F., Rabiee, Hamid R.
Off-policy learning and evaluation leverage logged bandit feedback datasets, which contain context, action, propensity score, and feedback for each data point. These scenarios face significant challenges due to high variance and poor performance with low-quality propensity scores and heavy-tailed reward distributions. We address these issues by introducing a novel estimator based on the log-sum-exponential (LSE) operator, which outperforms traditional inverse propensity score estimators. Our LSE estimator demonstrates variance reduction and robustness under heavy-tailed conditions. For off-policy evaluation, we derive upper bounds on the estimator's bias and variance. In the off-policy learning scenario, we establish bounds on the regret -- the performance gap between our LSE estimator and the optimal policy -- assuming bounded $(1+ε)$-th moment of weighted reward. Notably, we achieve a convergence rate of $O(n^{-ε/(1+ ε)})$ for the regret bounds, where $ε\in [0,1]$ and $n$ is the size of logged bandit feedback dataset. Theoretical analysis is complemented by comprehensive empirical evaluations in both off-policy learning and evaluation scenarios, confirming the practical advantages of our approach. The code for our estimator is available at the following link: https://github.com/armin-behnamnia/lse-offpolicy-learning.
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > Canada (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.67)
- Health & Medicine (1.00)
- Information Technology (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
An $(\epsilon,\delta)$-accurate level set estimation with a stopping criterion
Ishibashi, Hideaki, Matsui, Kota, Kutsukake, Kentaro, Hino, Hideitsu
The level set estimation problem seeks to identify regions within a set of candidate points where an unknown and costly to evaluate function's value exceeds a specified threshold, providing an efficient alternative to exhaustive evaluations of function values. Traditional methods often use sequential optimization strategies to find $\epsilon$-accurate solutions, which permit a margin around the threshold contour but frequently lack effective stopping criteria, leading to excessive exploration and inefficiencies. This paper introduces an acquisition strategy for level set estimation that incorporates a stopping criterion, ensuring the algorithm halts when further exploration is unlikely to yield improvements, thereby reducing unnecessary function evaluations. We theoretically prove that our method satisfies $\epsilon$-accuracy with a confidence level of $1 - \delta$, addressing a key gap in existing approaches. Furthermore, we show that this also leads to guarantees on the lower bounds of performance metrics such as F-score. Numerical experiments demonstrate that the proposed acquisition function achieves comparable precision to existing methods while confirming that the stopping criterion effectively terminates the algorithm once adequate exploration is completed.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- (4 more...)
- Education (0.46)
- Health & Medicine (0.46)
Early-Stopped Mirror Descent for Linear Regression over Convex Bodies
Wegel, Tobias, Kur, Gil, Rebeschini, Patrick
Early-stopped iterative optimization methods are widely used as alternatives to explicit regularization, and direct comparisons between early-stopping and explicit regularization have been established for many optimization geometries. However, most analyses depend heavily on the specific properties of the optimization geometry or strong convexity of the empirical objective, and it remains unclear whether early-stopping could ever be less statistically efficient than explicit regularization for some particular shape constraint, especially in the overparameterized regime. To address this question, we study the setting of high-dimensional linear regression under additive Gaussian noise when the ground truth is assumed to lie in a known convex body and the task is to minimize the in-sample mean squared error. Our main result shows that for any convex body and any design matrix, up to an absolute constant factor, the worst-case risk of unconstrained early-stopped mirror descent with an appropriate potential is at most that of the least squares estimator constrained to the convex body. We achieve this by constructing algorithmic regularizers based on the Minkowski functional of the convex body.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.27)
- North America (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)