AITopics | Mondal, Washim Uddin

Collaborating Authors

Mondal, Washim Uddin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning

Xu, Yang, Mondal, Washim Uddin, Aggarwal, Vaneet

arXiv.org Machine LearningFeb-23-2025

We present the first finite-sample analysis for policy evaluation in robust average-reward Markov Decision Processes (MDPs). Prior works in this setting have established only asymptotic convergence guarantees, leaving open the question of sample complexity. In this work, we address this gap by establishing that the robust Bellman operator is a contraction under the span semi-norm, and developing a stochastic approximation framework with controlled bias. Our approach builds upon Multi-Level Monte Carlo (MLMC) techniques to estimate the robust Bellman operator efficiently. To overcome the infinite expected sample complexity inherent in standard MLMC, we introduce a truncation mechanism based on a geometric distribution, ensuring a finite constant sample complexity while maintaining a small bias that decays exponentially with the truncation level. Our method achieves the order-optimal sample complexity of $\tilde{\mathcal{O}}(\epsilon^{-2})$ for robust policy evaluation and robust average reward estimation, marking a significant advancement in robust reinforcement learning theory.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2502.16816

Country: Asia > India (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

Aggarwal, Vaneet, Mondal, Washim Uddin, Bai, Qinbo

arXiv.org Artificial IntelligenceJul-17-2024

Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology, mechanics, and finance. The primary objective in these applications is to maximize the average reward. Real-world scenarios often necessitate adherence to specific constraints during the learning process. This monograph focuses on the exploration of various model-based and model-free approaches for Constrained RL within the context of average reward Markov Decision Processes (MDPs). The investigation commences with an examination of model-based strategies, delving into two foundational methods - optimism in the face of uncertainty and posterior sampling. Subsequently, the discussion transitions to parametrized model-free approaches, where the primal-dual policy gradient-based algorithm is explored as a solution for constrained MDPs. The monograph provides regret guarantees and analyzes constraint violation for each of the discussed setups. For the above exploration, we assume the underlying MDP to be ergodic. Further, this monograph extends its discussion to encompass results tailored for weakly communicating MDPs, thereby broadening the scope of its findings and their relevance to a wider range of practical scenarios.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2406.11481

Country: North America > United States (0.28)

Genre: Research Report (0.83)

Industry: Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Sample-Efficient Constrained Reinforcement Learning with General Parameterization

Mondal, Washim Uddin, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceMay-17-2024

We consider a constrained Markov Decision Problem (CMDP) where the goal of an agent is to maximize the expected discounted sum of rewards over an infinite horizon while ensuring that the expected discounted sum of costs exceeds a certain threshold. Building on the idea of momentum-based acceleration, we develop the Primal-Dual Accelerated Natural Policy Gradient (PD-ANPG) algorithm that guarantees an $\epsilon$ global optimality gap and $\epsilon$ constraint violation with $\mathcal{O}(\epsilon^{-3})$ sample complexity. This improves the state-of-the-art sample complexity in CMDP by a factor of $\mathcal{O}(\epsilon^{-1})$.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2405.10624

Country:

Asia (0.28)
North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report (0.40)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Variance-Reduced Policy Gradient Approaches for Infinite Horizon Average Reward Markov Decision Processes

Ganesh, Swetha, Mondal, Washim Uddin, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceApr-2-2024

We present two Policy Gradient-based methods with general parameterization in the context of infinite horizon average reward Markov Decision Processes. The first approach employs Implicit Gradient Transport for variance reduction, ensuring an expected regret of the order $\tilde{\mathcal{O}}(T^{3/5})$. The second approach, rooted in Hessian-based techniques, ensures an expected regret of the order $\tilde{\mathcal{O}}(\sqrt{T})$. These results significantly improve the state of the art of the problem, which achieves a regret of $\tilde{\mathcal{O}}(T^{3/4})$.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2404.02108

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes

Mondal, Washim Uddin, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceFeb-5-2024

We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an accelerated stochastic gradient descent process to obtain the natural policy gradient. ANPG achieves $\mathcal{O}({\epsilon^{-2}})$ sample complexity and $\mathcal{O}(\epsilon^{-1})$ iteration complexity with general parameterization where $\epsilon$ defines the optimality error. This improves the state-of-the-art sample complexity by a $\log(\frac{1}{\epsilon})$ factor. ANPG is a first-order algorithm and unlike some existing literature, does not require the unverifiable assumption that the variance of importance sampling (IS) weights is upper bounded. In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a factor of $\mathcal{O}(\epsilon^{-\frac{1}{2}})$ and simultaneously matches their state-of-the-art iteration complexity.

artificial intelligence, complexity, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2310.11677

Country: Europe > Spain (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm

Bai, Qinbo, Mondal, Washim Uddin, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceFeb-3-2024

The framework of Reinforcement Learning (RL) is concerned with a class of problems where an agent learns to yield the maximum cumulative reward in an unknown environment via repeated interaction. RL finds applications in diverse areas, such as wireless communication, transportation, and epidemic control (Yang et al., 2020; Al-Abbasi et al., 2019; Ling et al., 2023). RL problems are mainly categorized into three setups: episodic, infinite horizon discounted reward, and infinite horizon average reward. Among them, the infinite horizon average reward setup is particularly significant for real-world applications.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2402.02042

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

Bai, Qinbo, Mondal, Washim Uddin, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceFeb-2-2024

In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Distinguishing itself from existing works within this context, our approach harnesses the power of the general policy gradient-based algorithm, liberating it from the constraints of assuming a linear MDP structure. We propose a policy gradient-based algorithm and show its global convergence property. We then prove that the proposed algorithm has $\tilde{\mathcal{O}}({T}^{3/4})$ regret. Remarkably, this paper marks a pioneering effort by presenting the first exploration into regret-bound computation for the general parameterized policy gradient algorithm in the context of average reward scenarios.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2309.01922

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

Mondal, Washim Uddin, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceAug-28-2023

We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback. The delay and compositeness of rewards mean that rewards generated as a result of taking an action at a given state are fragmented into different components, and they are sequentially realized at delayed time instances. The partial anonymity attribute implies that a learner, for each state, only observes the aggregate of past reward components generated as a result of different actions taken at that state, but realized at the observation instance. We propose an algorithm named $\mathrm{DUCRL2}$ to obtain a near-optimal policy for this setting and show that it achieves a regret bound of $\tilde{\mathcal{O}}\left(DS\sqrt{AT} + d (SA)^3\right)$ where $S$ and $A$ are the sizes of the state and action spaces, respectively, $D$ is the diameter of the MDP, $d$ is a parameter upper bounded by the maximum reward delay, and $T$ denotes the time horizon. This demonstrates the optimality of the bound in the order of $T$, and an additive impact of the delay.

artificial intelligence, machine learning research, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2305.02527

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Mean-Field Control based Approximation of Multi-Agent Reinforcement Learning in Presence of a Non-decomposable Shared Global State

Mondal, Washim Uddin, Aggarwal, Vaneet, Ukkusuri, Satish V.

arXiv.org Artificial IntelligenceMay-26-2023

Mean Field Control (MFC) is a powerful approximation tool to solve large-scale Multi-Agent Reinforcement Learning (MARL) problems. However, the success of MFC relies on the presumption that given the local states and actions of all the agents, the next (local) states of the agents evolve conditionally independent of each other. Here we demonstrate that even in a MARL setting where agents share a common global state in addition to their local states evolving conditionally independently (thus introducing a correlation between the state transition processes of individual agents), the MFC can still be applied as a good approximation tool. The global state is assumed to be non-decomposable i.e., it cannot be expressed as a collection of local states of the agents. We compute the approximation error as $\mathcal{O}(e)$ where $e=\frac{1}{\sqrt{N}}\left[\sqrt{|\mathcal{X}|} +\sqrt{|\mathcal{U}|}\right]$. The size of the agent population is denoted by the term $N$, and $|\mathcal{X}|, |\mathcal{U}|$ respectively indicate the sizes of (local) state and action spaces of individual agents. The approximation error is found to be independent of the size of the shared global state space. We further demonstrate that in a special case if the reward and state transition functions are independent of the action distribution of the population, then the error can be improved to $e=\frac{\sqrt{|\mathcal{X}|}}{\sqrt{N}}$. Finally, we devise a Natural Policy Gradient based algorithm that solves the MFC problem with $\mathcal{O}(\epsilon^{-3})$ sample complexity and obtains a policy that is within $\mathcal{O}(\max\{e,\epsilon\})$ error of the optimal MARL policy for any $\epsilon>0$.

artificial intelligence, machine learning research, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2301.06889

Genre: Research Report (0.49)

Industry: Transportation > Passenger (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Cooperating Graph Neural Networks with Deep Reinforcement Learning for Vaccine Prioritization

Ling, Lu, Mondal, Washim Uddin, V, Satish, Ukkusuri, null

arXiv.org Artificial IntelligenceMay-9-2023

This study explores the vaccine prioritization strategy to reduce the overall burden of the pandemic when the supply is limited. Existing methods conduct macro-level or simplified micro-level vaccine distribution by assuming the homogeneous behavior within subgroup populations and lacking mobility dynamics integration. Directly applying these models for micro-level vaccine allocation leads to sub-optimal solutions due to the lack of behavioral-related details. To address the issue, we first incorporate the mobility heterogeneity in disease dynamics modeling and mimic the disease evolution process using a Trans-vaccine-SEIR model. Then we develop a novel deep reinforcement learning to seek the optimal vaccine allocation strategy for the high-degree spatial-temporal disease evolution system. The graph neural network is used to effectively capture the structural properties of the mobility contact network and extract the dynamic disease features. In our evaluation, the proposed framework reduces 7% - 10% of infections and deaths than the baseline strategies. Extensive evaluation shows that the proposed framework is robust to seek the optimal vaccine allocation with diverse mobility patterns in the micro-level disease evolution system. In particular, we find the optimal vaccine allocation strategy in the transit usage restriction scenario is significantly more effective than restricting cross-zone mobility for the top 10% age-based and income-based zones. These results provide valuable insights for areas with limited vaccines and low logistic efficacy.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2305.05163

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback