log 4
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Nebraska > Lancaster County > Lincoln (0.04)
- (11 more...)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- North America > United States > Oregon (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.34)
Deep Bootstrap
Chang, Jinyuan, Jiao, Yuling, Kang, Lican, Shi, Junjie
As a result, the demands for interval estimation, and consequently for its validity and precision, have experienced a sustained increase over time and are reflected in a number of recent studies. For example, in proteomics, confidence intervals are employed to assess the association between post-translational modifications and intrinsically disordered regions of proteins, validating hypotheses derived from predictive models and facilitating large-scale functional analyses (Tunyasuvunakool et al., 2021; Bludau et al., 2022). In genomic research, confidence intervals are leveraged to characterize the distribution of gene expression levels, enabling robust inferences about promoter sequence effects and genetic variability (Vaishnav et al., 2022). In the realm of environmental science, interval estimation can be used to monitor deforestation rates of forests, yielding uncertainty-aware insights critical for climate policy formulation (Bullock et al., 2020). As for social sciences, confidence intervals are utilized to evaluate relationships between socioeconomic factors, bolstering the robustness of conclusions drawn from census data (Ding et al., 2021).
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
Online Learning for Uninformed Markov Games: Empirical Nash-Value Regret and Non-Stationarity Adaptation
Liu, Junyan, Luo, Haipeng, Zhang, Zihan, Ratliff, Lillian J.
We study online learning in two-player uninformed Markov games, where the opponent's actions and policies are unobserved. In this setting, Tian et al. (2021) show that achieving no-external-regret is impossible without incurring an exponential dependence on the episode length $H$. They then turn to the weaker notion of Nash-value regret and propose a V-learning algorithm with regret $O(K^{2/3})$ after $K$ episodes. However, their algorithm and guarantee do not adapt to the difficulty of the problem: even in the case where the opponent follows a fixed policy and thus $O(\sqrt{K})$ external regret is well-known to be achievable, their result is still the worse rate $O(K^{2/3})$ on a weaker metric. In this work, we fully address both limitations. First, we introduce empirical Nash-value regret, a new regret notion that is strictly stronger than Nash-value regret and naturally reduces to external regret when the opponent follows a fixed policy. Moreover, under this new metric, we propose a parameter-free algorithm that achieves an $O(\min \{\sqrt{K} + (CK)^{1/3},\sqrt{LK}\})$ regret bound, where $C$ quantifies the variance of the opponent's policies and $L$ denotes the number of policy switches (both at most $O(K)$). Therefore, our results not only recover the two extremes -- $O(\sqrt{K})$ external regret when the opponent is fixed and $O(K^{2/3})$ Nash-value regret in the worst case -- but also smoothly interpolate between these extremes by automatically adapting to the opponent's non-stationarity. We achieve so by first providing a new analysis of the epoch-based V-learning algorithm by Mao et al. (2022), establishing an $O(ηC + \sqrt{K/η})$ regret bound, where $η$ is the epoch incremental factor. Next, we show how to adaptively restart this algorithm with an appropriate $η$ in response to the potential non-stationarity of the opponent, eventually achieving our final results.
- North America > United States > California (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Hong Kong (0.04)
- Leisure & Entertainment > Games (0.67)
- Education > Educational Setting > Online (0.60)
3e6260b81898beacda3d16db379ed329-Supplemental.pdf
Moreover,we set the initial distributionξ1 tobeuniformoverS. As mentioned in the discussion following Theorem 4.1, it holds thatDVA DFQI. These findings also shed light on the minimax optimality of the OPE problem. PH h=1kvhkΛ 1h, is tighter. Here taking maximum with1 is to deal with the situation wherebVhbVπh+1(,) is close to zero or negative, and the second1 is to account for the variance of the rewards.