AITopics | r-max

Collaborating Authors

r-max

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Explicit Planning for Efficient Exploration in Reinforcement Learning

Liangpeng Zhang, Ke Tang, Xin Yao

Neural Information Processing SystemsFeb-19-2026, 12:35:05 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

0cc6ee01c82fc49c28706e0918f57e2d-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 11:13:18 GMT

final version, near-optimal sample complexity, planning oracle, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.62)

Add feedback

Explicit Planning for Efficient Exploration in Reinforcement Learning

Liangpeng Zhang, Ke Tang, Xin Yao

Neural Information Processing SystemsOct-9-2025, 13:59:53 GMT

A straightforward example is as follows.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.69)
Asia > China > Guangdong Province (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

We will improve the broader impact section by emphasizing the implications of our theoretical

Neural Information Processing SystemsOct-9-2025, 13:12:53 GMT

We sincerely thank all the reviewers, and feel really honored to receive such positive and constructive comments. We will mention total variation distance in the appendix, and correct the typo on "Corollary Note that the smooth planning oracle is not needed throughout the paper, and is thus not the "primary It is only used in Sec. We have discussed R-MAX in lines 82-83. By saying "especially model-free ones..." this sentence, we simply meant The works on Q-learning in games you mentioned exactly conquered this issue, with non-trivial efforts. We will address all the grammatical comments/typos in the final version.

artificial intelligence, broader impact section, final version, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.62)

Add feedback

Increasingly Cautious Optimism for Practical PAC-MDP Exploration

Zhang, Liangpeng (University of Science and Technology of China) | Tang, Ke (University of Science and Technology of China) | Yao, Xin (University of Birmingham)

AAAI ConferencesJul-15-2015

Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PACMDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.

artificial intelligence, r-max, upstream oil & gas, (18 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States (0.29)
Europe > United Kingdom (0.14)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Learning to Coordinate Efficiently: A Model-based Approach

Brafman, R. I., Tennenholtz, M.

arXiv.org Artificial IntelligenceJun-26-2011

Pla y ers parti ipating in su h games m ust learn to o ordinate with ea h other in order to re eiv e the highest-p ossible v alue. A n um b er of reinfor emen t learning algorithms ha v e b een prop osed for this problem, and some ha v e b een sho wn to on v erge to go o d solutions in the limit. In this pap er w e sho w that using v ery simple mo del-based algorithms, m u h b etter (i.e., p olynomial) on v ergen e rates an b e attained. Moreo v er, our mo del-based algorithms are guaran teed to on v erge to the optimal v alue, unlik e man y of the existing algorithms. The distributed nature of su h systems mak es the problem of learning to a t in an unkno wn en vironmen t more diÆ ult b e ause the agen ts m ust o ordinate b oth their learning pro ess and their a tion hoi es. Ho w ev er, the need to o ordinate is not restri ted to distributed agen ts, as it arises naturally among self-in terested agen ts in ertain en vironmen ts. A go o d mo del for su h en vironmen ts is that of a ommon-inter est sto hasti game (CISG). A sto hasti game (Shapley, 1953) is a mo del of m ulti-agen t in tera tions onsisting of m ultiple nite or innite stages, in ea h of whi h the agen ts pla y a one-shot strategi form game. The iden tit y of ea h stage dep ends sto hasti ally on the previous stage and the a tions p erformed b y the agen ts in that stage. The goal of ea h agen t is to maximize some fun tion of its rew ard stream - either its a v erage rew ard or its sum of dis oun ted rew ards. A CISG is a sto hasti game in whi h at ea h p oin t the pa y o of all agen ts is iden ti al. V arious algorithms for learning in CISGs ha v e b een prop osed in the literature.

artificial intelligence, machine learning, tion, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.1154

1106.5258

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.40)

Add feedback