AITopics | bilinear class

Collaborating Authors

bilinear class

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Improved Model-Free Decision-Estimation Coefficient with Applications in Adversarial MDPs

Liu, Haolin, Wei, Chen-Yu, Zimmert, Julian

arXiv.org Artificial IntelligenceOct-13-2025

We study decision making with structured observation (DMSO). Previous work (Foster et al., 2021b, 2023a) has characterized the complexity of DMSO via the decision-estimation coefficient (DEC), but left a gap between the regret upper and lower bounds that scales with the size of the model class. To tighten this gap, Foster et al. (2023b) introduced optimistic DEC, achieving a bound that scales only with the size of the value-function class. However, their optimism-based exploration is only known to handle the stochastic setting, and it remains unclear whether it extends to the adversarial setting. We introduce Dig-DEC, a model-free DEC that removes optimism and drives exploration purely by information gain. Dig-DEC is always no larger than optimistic DEC and can be much smaller in special cases. Importantly, the removal of optimism allows it to handle adversarial environments without explicit reward estimators. By applying Dig-DEC to hybrid MDPs with stochastic transitions and adversarial rewards, we obtain the first model-free regret bounds for hybrid MDPs with bandit feedback under several general transition structures, resolving the main open problem left by Liu et al. (2025). We also improve the online function-estimation procedure in model-free learning: For average estimation error minimization, we refine the estimator in Foster et al. (2023b) to achieve sharper concentration, improving their regret bounds from $T^{3/4}$ to $T^{2/3}$ (on-policy) and from $T^{5/6}$ to $T^{7/9}$ (off-policy). For squared error minimization in Bellman-complete MDPs, we redesign their two-timescale procedure, improving the regret bound from $T^{2/3}$ to $\sqrt{T}$. This is the first time a DEC-based method achieves performance matching that of optimism-based approaches (Jin et al., 2021; Xie et al., 2023) in Bellman-complete MDPs.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.08882

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Decision Making in Hybrid Environments: A Model Aggregation Approach

Liu, Haolin, Wei, Chen-Yu, Zimmert, Julian

arXiv.org Machine LearningFeb-9-2025

Recent work by Foster et al. (2021, 2022, 2023b) and Xu and Zeevi (2023) developed the framework of decision estimation coefficient (DEC) that characterizes the complexity of general online decision making problems and provides a general algorithm design principle. These works, however, either focus on the pure stochastic regime where the world remains fixed over time, or the pure adversarial regime where the world arbitrarily changes over time. For the hybrid regime where the dynamics of the world is fixed while the reward arbitrarily changes, they only give pessimistic bounds on the decision complexity. In this work, we propose a general extension of DEC that more precisely characterizes this case. Besides applications in special cases, our framework leads to a flexible algorithm design where the learner learns over subsets of the hypothesis set, trading estimation complexity with decision complexity, which could be of independent interest. Our work covers model-based learning and model-free learning in the hybrid regime, with a newly proposed extension of the bilinear classes (Du et al., 2021) to the adversarial-reward case. We also recover some existing model-free learning results in the pure stochastic regime.

artificial intelligence, complexity, machine learning, (13 more...)

arXiv.org Machine Learning

2502.05974

Country: North America > United States > Virginia (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

Chen, Zixiang, Li, Chris Junchi, Yuan, Angela, Gu, Quanquan, Jordan, Michael I.

arXiv.org Artificial IntelligenceSep-30-2022

Reinforcement learning (RL) is a decision-making process that seeks to maximize the expected reward when an agent interacts with the environment [Sutton and Barto, 2018]. Over the past decade, RL has gained increasing attention due to its successes in a wide range of domains, including Atari games [Mnih et al., 2013], Go game [Silver et al., 2016], autonomous driving [Yurtsever et al., 2020], Robotics [Kober et al., 2013], etc. Existing RL algorithms can be categorized into value-based algorithms such as Q-learning [Watkins, 1989] and policy-based algorithms such as policy gradient [Sutton et al., 1999]. They can also be categorized as a model-free approach where one directly models the value function classes, or alternatively, a model-based approach where one needs to estimate the transition probability. Due to the intractably large state and action spaces that are used to model the real-world complex environment, function approximation in RL has become prominent in both algorithm design and theoretical analysis. It is a pressing challenge to design sample-efficient RL algorithms with general function approximations. In the special case where the underlying Markov Decision Processes (MDPs) enjoy certain linear structures, several lines of works have achieved polynomial sample complexity and/or T regret guarantees under either model-free or model-based RL settings. For linear MDPs where the transition probability and the reward function admit linear structure, Yang and Wang [2019] developed a variant of Q-learning when granted access to a generative model, Jin et al. [2020] proposed an LSVI-UCB algorithm with a Õ( d

dimension, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2209.15634

Country:

Asia > Middle East > Jordan (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Industry:

Leisure & Entertainment > Games > Computer Games (0.54)
Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

The Statistical Complexity of Interactive Decision Making

Foster, Dylan J., Kakade, Sham M., Qian, Jian, Rakhlin, Alexander

arXiv.org Machine LearningDec-26-2021

A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient, adaptive learning algorithms that achieve near-optimal regret. This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher complexity) that govern the statistical complexity of learning. However, characterizing the statistical complexity of interactive learning is substantially more challenging due to the adaptive nature of the problem. The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning. In particular, we provide: 1. a lower bound on the optimal regret for any interactive decision making problem, establishing the Decision-Estimation Coefficient as a fundamental limit. 2. a unified algorithm design principle, Estimation-to-Decisions (E2D), which transforms any algorithm for supervised estimation into an online algorithm for decision making. E2D attains a regret bound matching our lower bound, thereby achieving optimal sample-efficient learning as characterized by the Decision-Estimation Coefficient. Taken together, these results constitute a theory of learnability for interactive decision making. When applied to reinforcement learning settings, the Decision-Estimation Coefficient recovers essentially all existing hardness results and lower bounds. More broadly, the approach can be viewed as a decision-theoretic analogue of the classical Le Cam theory of statistical estimation; it also unifies a number of existing approaches -- both Bayesian and frequentist.

algorithm, decision-estimation coefficient, theorem 3, (15 more...)

arXiv.org Machine Learning

2112.13487

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Bilinear Classes: A Structural Framework for Provable Generalization in RL

Du, Simon S., Kakade, Sham M., Lee, Jason D., Lovett, Shachar, Mahajan, Gaurav, Sun, Wen, Wang, Ruosong

arXiv.org Artificial IntelligenceMar-19-2021

This work introduces Bilinear Classes, a new structural framework, which permit generalization in reinforcement learning in a wide variety of settings through the use of function approximation. The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space. Our main result provides an RL algorithm which has polynomial sample complexity for Bilinear Classes; notably, this sample complexity is stated in terms of a reduction to the generalization error of an underlying supervised learning sub-problem. These bounds nearly match the best known sample complexity bounds for existing models. Furthermore, this framework also extends to the infinite dimensional (RKHS) setting: for the the Linear $Q^*/V^*$ model, linear MDPs, and linear mixture MDPs, we provide sample complexities that have no explicit dependence on the explicit feature dimension (which could be infinite), but instead depends only on information theoretic quantities.

bilinear class, equation, lemma 5, (17 more...)

arXiv.org Artificial Intelligence

2103.10897

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback