AITopics | q-learning 0

Collaborating Authors

q-learning 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

5eeb693f46d753e5fe24c97212c22bd2-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 08:25:23 GMT

agent, psrl 0, q-learning 0, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.58)

Add feedback

A Colosseum

Neural Information Processing SystemsAug-15-2025, 05:20:08 GMT

The performances of the agents are in line with the results reported in Osband et al.

agent, psrl 0, q-learning 0, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.58)

Add feedback

Hardness in Markov Decision Processes: Theory and Practice

Conserva, Michelangelo, Rauber, Paulo

arXiv.org Artificial IntelligenceOct-24-2022

Meticulously analysing the empirical strengths and weaknesses of reinforcement learning methods in hard (challenging) environments is essential to inspire innovations and assess progress in the field. In tabular reinforcement learning, there is no well-established standard selection of environments to conduct such analysis, which is partially due to the lack of a widespread understanding of the rich theory of hardness of environments. The goal of this paper is to unlock the practical usefulness of this theory through four main contributions. First, we present a systematic survey of the theory of hardness, which also identifies promising research directions. Second, we introduce Colosseum, a pioneering package that enables empirical hardness analysis and implements a principled benchmark composed of environments that are diverse with respect to different measures of hardness. Third, we present an empirical analysis that provides new insights into computable measures. Finally, we benchmark five tabular agents in our newly proposed benchmark. While advancing the theoretical understanding of hardness in non-tabular reinforcement learning remains essential, our contributions in the tabular setting are intended as solid steps towards a principled non-tabular benchmark. Accordingly, we benchmark four agents in non-tabular versions of Colosseum environments, obtaining results that demonstrate the generality of tabular hardness measures.

machine learning, mg-empty, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2210.13075

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Industry: Leisure & Entertainment > Games > Computer Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.83)

Add feedback

Reinforcement Learning for Joint V2I Network Selection and Autonomous Driving Policies

Yan, Zijiang, Tabassum, Hina

arXiv.org Artificial IntelligenceAug-3-2022

Vehicle-to-Infrastructure (V2I) communication is becoming critical for the enhanced reliability of autonomous vehicles (AVs). However, the uncertainties in the road-traffic and AVs' wireless connections can severely impair timely decision-making. It is thus critical to simultaneously optimize the AVs' network selection and driving policies in order to minimize road collisions while maximizing the communication data rates. In this paper, we develop a reinforcement learning (RL) framework to characterize efficient network selection and autonomous driving policies in a multi-band vehicular network (VNet) operating on conventional sub-6GHz spectrum and Terahertz (THz) frequencies. The proposed framework is designed to (i) maximize the traffic flow and minimize collisions by controlling the vehicle's motion dynamics (i.e., speed and acceleration) from autonomous driving perspective, and (ii) maximize the data rates and minimize handoffs by jointly controlling the vehicle's motion dynamics and network selection from telecommunication perspective. We cast this problem as a Markov Decision Process (MDP) and develop a deep Q-learning based solution to optimize the actions such as acceleration, deceleration, lane-changes, and AV-base station assignments for a given AV's state. The AV's state is defined based on the velocities and communication channel states of AVs. Numerical results demonstrate interesting insights related to the inter-dependency of vehicle's motion dynamics, handoffs, and the communication data rate. The proposed policies enable AVs to adopt safe driving behaviors with improved connectivity.

data rate, handoff, probability, (14 more...)

arXiv.org Artificial Intelligence

2208.02249

Country: North America > Canada (0.04)

Genre: Research Report (0.70)

Industry:

Transportation > Ground > Road (0.82)
Information Technology > Robotics & Automation (0.82)
Automobiles & Trucks (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Boosting Algorithms for Estimating Optimal Individualized Treatment Rules

Wang, Duzhe, Fu, Haoda, Loh, Po-Ling

arXiv.org Machine LearningJan-31-2020

The proposed algorithms are based on the XGBoost algorithm, which is known as one of the most powerful algorithms in the machine learning literature. Our main idea is to model the conditional mean of clinical outcome or the decision rule via additive regression trees, and use the boosting technique to estimate each single tree iteratively. Our approaches overcome the challenge of correct model specification, which is required in current parametric methods. The major contribution of our proposed algorithms is providing efficient and accurate estimation of the highly nonlinear and complex optimal individualized treatment rules that often arise in practice. Finally, we illustrate the superior performance of our algorithms by extensive simulation studies and conclude with an application to the real data from a diabetes Phase III trial. 1 Introduction Precision medicine, as an emerging medical approach for disease treatment and prevention, has received more and more attention among government, healthcare industry and academia in recent years. It is a well-known fact that there exists a significant heterogeneity for patients in response to treatments. For example, as demonstrated in [9], for patients who are infected with human immunodeficiency virus and tuberculosis, their optimal timing of antiretroviral therapy (ART) varies significantly.

algorithm, individualized treatment rule, learning, (15 more...)

arXiv.org Machine Learning

2002.00079

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Indiana > Marion County > Indianapolis (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.94)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback