Goto

Collaborating Authors

 q-learning 0




Hardness in Markov Decision Processes: Theory and Practice

Conserva, Michelangelo, Rauber, Paulo

arXiv.org Artificial Intelligence

Meticulously analysing the empirical strengths and weaknesses of reinforcement learning methods in hard (challenging) environments is essential to inspire innovations and assess progress in the field. In tabular reinforcement learning, there is no well-established standard selection of environments to conduct such analysis, which is partially due to the lack of a widespread understanding of the rich theory of hardness of environments. The goal of this paper is to unlock the practical usefulness of this theory through four main contributions. First, we present a systematic survey of the theory of hardness, which also identifies promising research directions. Second, we introduce Colosseum, a pioneering package that enables empirical hardness analysis and implements a principled benchmark composed of environments that are diverse with respect to different measures of hardness. Third, we present an empirical analysis that provides new insights into computable measures. Finally, we benchmark five tabular agents in our newly proposed benchmark. While advancing the theoretical understanding of hardness in non-tabular reinforcement learning remains essential, our contributions in the tabular setting are intended as solid steps towards a principled non-tabular benchmark. Accordingly, we benchmark four agents in non-tabular versions of Colosseum environments, obtaining results that demonstrate the generality of tabular hardness measures.


Reinforcement Learning for Joint V2I Network Selection and Autonomous Driving Policies

Yan, Zijiang, Tabassum, Hina

arXiv.org Artificial Intelligence

Vehicle-to-Infrastructure (V2I) communication is becoming critical for the enhanced reliability of autonomous vehicles (AVs). However, the uncertainties in the road-traffic and AVs' wireless connections can severely impair timely decision-making. It is thus critical to simultaneously optimize the AVs' network selection and driving policies in order to minimize road collisions while maximizing the communication data rates. In this paper, we develop a reinforcement learning (RL) framework to characterize efficient network selection and autonomous driving policies in a multi-band vehicular network (VNet) operating on conventional sub-6GHz spectrum and Terahertz (THz) frequencies. The proposed framework is designed to (i) maximize the traffic flow and minimize collisions by controlling the vehicle's motion dynamics (i.e., speed and acceleration) from autonomous driving perspective, and (ii) maximize the data rates and minimize handoffs by jointly controlling the vehicle's motion dynamics and network selection from telecommunication perspective. We cast this problem as a Markov Decision Process (MDP) and develop a deep Q-learning based solution to optimize the actions such as acceleration, deceleration, lane-changes, and AV-base station assignments for a given AV's state. The AV's state is defined based on the velocities and communication channel states of AVs. Numerical results demonstrate interesting insights related to the inter-dependency of vehicle's motion dynamics, handoffs, and the communication data rate. The proposed policies enable AVs to adopt safe driving behaviors with improved connectivity.


Boosting Algorithms for Estimating Optimal Individualized Treatment Rules

Wang, Duzhe, Fu, Haoda, Loh, Po-Ling

arXiv.org Machine Learning

The proposed algorithms are based on the XGBoost algorithm, which is known as one of the most powerful algorithms in the machine learning literature. Our main idea is to model the conditional mean of clinical outcome or the decision rule via additive regression trees, and use the boosting technique to estimate each single tree iteratively. Our approaches overcome the challenge of correct model specification, which is required in current parametric methods. The major contribution of our proposed algorithms is providing efficient and accurate estimation of the highly nonlinear and complex optimal individualized treatment rules that often arise in practice. Finally, we illustrate the superior performance of our algorithms by extensive simulation studies and conclude with an application to the real data from a diabetes Phase III trial. 1 Introduction Precision medicine, as an emerging medical approach for disease treatment and prevention, has received more and more attention among government, healthcare industry and academia in recent years. It is a well-known fact that there exists a significant heterogeneity for patients in response to treatments. For example, as demonstrated in [9], for patients who are infected with human immunodeficiency virus and tuberculosis, their optimal timing of antiretroviral therapy (ART) varies significantly.