AITopics | Talebi, Mohammad Sadegh

Collaborating Authors

Talebi, Mohammad Sadegh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Provably Efficient Exploration in Reward Machines with Low Regret

Bourel, Hippolyte, Jonsson, Anders, Maillard, Odalric-Ambrym, Ma, Chenxiao, Talebi, Mohammad Sadegh

arXiv.org Artificial IntelligenceDec-26-2024

We study reinforcement learning (RL) for decision processes with non-Markovian reward, in which high-level knowledge of the task in the form of reward machines is available to the learner. We consider probabilistic reward machines with initially unknown dynamics, and investigate RL under the average-reward criterion, where the learning performance is assessed through the notion of regret. Our main algorithmic contribution is a model-based RL algorithm for decision processes involving probabilistic reward machines that is capable of exploiting the structure induced by such machines. We further derive high-probability and non-asymptotic bounds on its regret and demonstrate the gain in terms of regret over existing algorithms that could be applied, but obliviously to the structure. We also present a regret lower bound for the studied setting. To the best of our knowledge, the proposed algorithm constitutes the first attempt to tailor and analyze regret specifically for RL with probabilistic reward machines.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2412.19194

Country: Europe > France (0.27)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

No-regret Exploration in Shuffle Private Reinforcement Learning

Bai, Shaojie, Talebi, Mohammad Sadegh, Zhao, Chengcheng, Cheng, Peng, Chen, Jiming

arXiv.org Artificial IntelligenceNov-18-2024

Differential privacy (DP) has recently been introduced into episodic reinforcement learning (RL) to formally address user privacy concerns in personalized services. Previous work mainly focuses on two trust models of DP: the central model, where a central agent is responsible for protecting users' sensitive data, and the (stronger) local model, where the protection occurs directly on the user side. However, they either require a trusted central agent or incur a significantly higher privacy cost, making it unsuitable for many scenarios. This work introduces a trust model stronger than the central model but with a lower privacy cost than the local model, leveraging the emerging \emph{shuffle} model of privacy. We present the first generic algorithm for episodic RL under the shuffle model, where a trusted shuffler randomly permutes a batch of users' data before sending it to the central agent. We then instantiate the algorithm using our proposed shuffle Privatizer, relying on a shuffle private binary summation mechanism. Our analysis shows that the algorithm achieves a near-optimal regret bound comparable to that of the centralized model and significantly outperforms the local model in terms of privacy cost.

artificial intelligence, machine learning, shuffle private reinforcement learning, (1 more...)

arXiv.org Artificial Intelligence

2411.11647

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Improved Exploration in Factored Average-Reward MDPs

Talebi, Mohammad Sadegh, Jonsson, Anders, Maillard, Odalric-Ambrym

arXiv.org Machine LearningSep-9-2020

In reinforcement learning (RL), an agent repeatedly interacts with an unknown environment in order to maximize its cumulative reward. A typical model of the environment is a Markov decision process (MDP): in each decision epoch, the agent observes a state, takes an action and receives a reward before transiting to the next state. To achieve its objective, the agent has to estimate the parameters of the MDP from experience and learn a policy that maps states to actions. While doing so, the agent faces a choice between two basic strategies: exploration, i.e. discovering the effects of actions on the environment, and exploitation, i.e. using its current knowledge to maximize reward in the short term. Most of model-based RL algorithms treat the state as a black box.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2009.04575

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Tightening Exploration in Upper Confidence Reinforcement Learning

Bourel, Hippolyte, Maillard, Odalric-Ambrym, Talebi, Mohammad Sadegh

arXiv.org Machine LearningJun-8-2020

The upper confidence reinforcement learning (UCRL2) strategy introduced in (Jaksch et al., 2010) is a popular method to perform regret minimization in unknown discrete Markov Decision Processes under the average-reward criterion. Despite its nice and generic theoretical regret guarantees, this strategy and its variants have remained until now mostly theoretical as numerical experiments on simple environments exhibit long burn-in phases before the learning takes place. Motivated by practical efficiency, we present UCRL3, following the lines of UCRL2, but with two key modifications: First, it uses state-of-the-art time-uniform concentration inequalities, to compute confidence sets on the reward and transition distributions for each state-action pair. To further tighten exploration, we introduce an adaptive computation of the support of each transition distributions. This enables to revisit the extended value iteration procedure to optimize over distributions with reduced support by disregarding low probability transitions, while still ensuring near-optimism. We demonstrate, through numerical experiments on standard environments, that reducing exploration this way yields a substantial numerical improvement compared to UCRL2 and its variants. On the theoretical side, these key modifications enable to derive a regret bound for UCRL3 improving on UCRL2, that for the first time makes appear a notion of local diameter and effective support, thanks to variance-aware concentration bounds.

artificial intelligence, optimization problem, ucrl2, (20 more...)

arXiv.org Machine Learning

2004.09656

Country:

Europe (0.46)
North America > United States (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Model-Based Reinforcement Learning Exploiting State-Action Equivalence

Asadi, Mahsa, Talebi, Mohammad Sadegh, Bourel, Hippolyte, Maillard, Odalric-Ambrym

arXiv.org Artificial IntelligenceOct-9-2019

Leveraging an equivalence property in the state-space of a Markov Decision Process (MDP) has been investigated in several studies. This paper studies equivalence structure in the reinforcement learning (RL) setup, where transition distributions are no longer assumed to be known. We present a notion of similarity between transition probabilities of various state-action pairs of an MDP, which naturally defines an equivalence structure in the state-action space. We present equivalence-aware confidence sets for the case where the learner knows the underlying structure in advance. These sets are provably smaller than their corresponding equivalence-oblivious counterparts. In the more challenging case of an unknown equivalence structure, we present an algorithm called ApproxEquivalence that seeks to find an (approximate) equivalence structure, and define confidence sets using the approximate equivalence. To illustrate the efficacy of the presented confidence sets, we present C-UCRL, as a natural modification of UCRL2 for RL in undiscounted MDPs. In the case of a known equivalence structure, we show that C-UCRL improves over UCRL2 in terms of regret by a factor of $\sqrt{SA/C}$, in any communicating MDP with $S$ states, $A$ actions, and $C$ classes, which corresponds to a massive improvement when $C \ll SA$. To the best of our knowledge, this is the first work providing regret bounds for RL when an equivalence structure in the MDP is efficiently exploited. In the case of an unknown equivalence structure, we show through numerical experiments that C-UCRL combined with ApproxEquivalence outperforms UCRL2 in ergodic MDPs.

artificial intelligence, null, survey article, (17 more...)

arXiv.org Artificial Intelligence

1910.04077

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)
Europe > France (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs

Talebi, Mohammad Sadegh, Maillard, Odalric-Ambrym

arXiv.org Machine LearningMar-5-2018

The problem of reinforcement learning in an unknown and discrete Markov Decision Process (MDP) under the average-reward criterion is considered, when the learner interacts with the system in a single stream of observations, starting from an initial state without any reset. We revisit the minimax lower bound for that problem by making appear the local variance of the bias function in place of the diameter of the MDP. Furthermore, we provide a novel analysis of the KL-UCRL algorithm establishing a high-probability regret bound scaling as $\widetilde {\mathcal O}\Bigl({\textstyle \sqrt{S\sum_{s,a}{\bf V}^\star_{s,a}T}}\Big)$ for this algorithm for ergodic MDPs, where $S$ denotes the number of states and where ${\bf V}^\star_{s,a}$ is the variance of the bias function with respect to the next-state distribution following action $a$ in state $s$. The resulting bound improves upon the best previously known regret bound $\widetilde {\mathcal O}(DS\sqrt{AT})$ for that algorithm, where $A$ and $D$ respectively denote the maximum number of actions (per state) and the diameter of MDP. We finally compare the leading terms of the two bounds in some benchmark MDPs indicating that the derived bound can provide an order of magnitude improvement in some cases. Our analysis leverages novel variations of the transportation lemma combined with Kullback-Leibler concentration inequalities, that we believe to be of independent interest.

artificial intelligence, probability, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1803.01626

Country: Europe > France (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback