approx
- Research Report > New Finding (0.67)
- Overview (0.46)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Provable Offline Reinforcement Learning for Structured Cyclic MDPs
Lee, Kyungbok, Sarteau, Angelica Cristello, Kosorok, Michael R.
We introduce a novel cyclic Markov decision process (MDP) framework for multi-step decision problems with heterogeneous stage-specific dynamics, transitions, and discount factors across the cycle. In this setting, offline learning is challenging: optimizing a policy at any stage shifts the state distributions of subsequent stages, propagating mismatch across the cycle. To address this, we propose a modular structural framework that decomposes the cyclic process into stage-wise sub-problems. While generally applicable, we instantiate this principle as CycleFQI, an extension of fitted Q-iteration enabling theoretical analysis and interpretation. It uses a vector of stage-specific Q-functions, tailored to each stage, to capture within-stage sequences and transitions between stages. This modular design enables partial control, allowing some stages to be optimized while others follow predefined policies. We establish finite-sample suboptimality error bounds and derive global convergence rates under Besov regularity, demonstrating that CycleFQI mitigates the curse of dimensionality compared to monolithic baselines. Additionally, we propose a sieve-based method for asymptotic inference of optimal policy values under a margin condition. Experiments on simulated and real-world Type 1 Diabetes data sets demonstrate CycleFQI's effectiveness.
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- Europe > Portugal > Porto > Porto (0.04)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Education > Health & Safety > School Nutrition (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
min
LetAbean nHermitian matrixandletBbea(n 1) (n 1)matrixwhich is constructed by deleting thei-th row andi-th column ofA. Denote thatΦ = [ϕ(x1),...,ϕ(xn)] Rn D, where D is the dimension of feature spaceH. Performing rank-n singular value decomposition (SVD) onΦ, we have Φ = HΣV, where H Rn n, Σ Rn n is a diagonal matrix whose diagonal elements are the singular values of Φ,andV RD n. F(α) in Eq.(21) is proven differentiable and thep-th component of the gradient is F(α) αp = Then, a reduced gradient descent algorithm [26] is adopted to optimize Eq.(21). The three deep neural networks are pre-trained on the ImageNet[5].
Appendices ABernoulli-CRSProperties
Let us defineK Rn n a random diagonal sampling matrix whereKj,j Bernoulli(pj) for 1 j n. Therefore, Bernoulli-CRS will perform on average the same amount of computations as in the fixed-rankCRS. This formulation immediately hints atthe possibility tosample over the input channeldimension, similarly to sampling column-row pairs in matrices. Let ` be a β-Lipschitz loss function, and let the network be trained with SGD using properly decreasing learning rate. Let us denote the weight, bias and activation gradients with respect to a loss function` by Wl, bl, al respectively.
- Research Report > New Finding (0.47)
- Research Report > Experimental Study (0.47)
- North America > United States > California > Santa Clara County > Stanford (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)