AITopics | Merlis, Nadav

Collaborating Authors

Merlis, Nadav

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Lenient Regret for Multi-Armed Bandits

Merlis, Nadav, Mannor, Shie

arXiv.org Machine LearningSep-13-2020

We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. For example, in large problems, or when the interaction with the environment is brief, finding an optimal arm is infeasible, and regret-minimizing algorithms tend to over-explore. To overcome this issue, algorithms for such settings should instead focus on playing near-optimal arms. To this end, we suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $\epsilon$. We then present a variant of the Thompson Sampling (TS) algorithm, called $\epsilon$-TS, and prove its asymptotic optimality in terms of the lenient regret. Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of $\epsilon$-TS is bounded by a constant. Finally, we show that $\epsilon$-TS can be applied to improve the performance when the agent knows a lower bound of the suboptimality gaps.

artificial intelligence, big data, lenient regret, (19 more...)

arXiv.org Machine Learning

2008.03959

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reinforcement Learning with Trajectory Feedback

Efroni, Yonathan, Merlis, Nadav, Mannor, Shie

arXiv.org Machine LearningAug-13-2020

The computational model of reinforcement learning is based upon the ability to query a score of every visited state-action pair, i.e., to observe a per state-action reward signal. However, in practice, it is often the case such a score is not readily available to the algorithm designer. In this work, we relax this assumption and require a weaker form of feedback, which we refer to as \emph{trajectory feedback}. Instead of observing the reward from every visited state-action pair, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent. We study natural extensions of reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing the regret. For cases where the transition model is unknown, we offer a hybrid optimistic-Thompson Sampling approach that results in a computationally efficient algorithm.

algorithm, artificial intelligence, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2008.06036

Country: Asia > Middle East (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Zahavy, Tom, Haroush, Matan, Merlis, Nadav, Mankowitz, Daniel J., Mannor, Shie

Neural Information Processing SystemsFeb-14-2020, 12:57:22 GMT

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is easier to learn which actions not to take. In this work, we propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions. The AEN is trained to predict invalid actions, supervised by an external elimination signal provided by the environment. Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in text-based games with over a thousand discrete actions.

artificial intelligence, machine learning, reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

Tessler, Chen, Merlis, Nadav, Mannor, Shie

arXiv.org Artificial IntelligenceOct-2-2019

In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains. However, they lack the theoretical guarantees which are present in the tabular setting and suffer from many stability and reproducibility problems \citep{henderson2018deep}. In this work, we suggest a simple approach for improving stability and providing probabilistic performance guarantees in off-policy actor-critic deep reinforcement learning regimes. Experiments on continuous action spaces, in the MuJoCo control suite, show that our proposed method reduces the variance of the process and improves the overall performance.

artificial intelligence, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

1910.01062

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

Merlis, Nadav, Mannor, Shie

arXiv.org Machine LearningMay-29-2019

We consider the combinatorial multi-armed bandit (CMAB) problem, where the reward function is nonlinear. In this setting, the agent chooses a batch of arms on each round and receives feedback from each arm of the batch. The reward that the agent aims to maximize is a function of the selected arms and their expectations. In many applications, the reward function is highly nonlinear, and the performance of existing algorithms relies on a global Lipschitz constant to encapsulate the function's nonlinearity. This may lead to loose regret bounds, since by itself, a large gradient does not necessarily cause a large regret, but only in regions where the uncertainty in the reward's parameters is high. To overcome this problem, we introduce a new smoothness criterion, which we term \emph{Gini-weighted smoothness}, that takes into account both the nonlinearity of the reward and concentration properties of the arms. We show that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter. This, in turn, leads to much tighter regret bounds when the smoothness parameter is batch-size independent. For example, in the probabilistic maximum coverage (PMC) problem, that has many applications, including influence maximization, diverse recommendations and more, we achieve dramatic improvements in the upper bounds. We also prove matching lower bounds for the PMC problem and show that our algorithm is tight, up to a logarithmic factor in the problem's parameters.

artificial intelligence, batch-size independent regret, big data, (16 more...)

arXiv.org Machine Learning

1905.03125

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

Efroni, Yonathan, Merlis, Nadav, Ghavamzadeh, Mohammad, Mannor, Shie

arXiv.org Artificial IntelligenceMay-27-2019

State-of-the-art efficient model-based Reinforcement Learning (RL) algorithms typically act by iteratively solving empirical models, i.e., by performing \emph{full-planning} on Markov Decision Processes (MDPs) built by the gathered experience. In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$. Thus, full-planning in model-based RL can be avoided altogether without any performance degradation, and, by doing so, the computational complexity decreases by a factor of $S$. The results are based on a novel analysis of real-time dynamic programming, then extended to model-based RL. Specifically, we generalize existing algorithms that perform full-planning to such that act by 1-step planning. For these generalizations, we prove regret bounds with the same rate as their full-planning counterparts.

artificial intelligence, failure event, us government, (18 more...)

arXiv.org Artificial Intelligence

1905.11527

Country: North America > United States (0.30)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Zahavy, Tom, Haroush, Matan, Merlis, Nadav, Mankowitz, Daniel J., Mannor, Shie

Neural Information Processing SystemsDec-31-2018

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is sometimes easier to learn which actions not to take. In this work, we propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions. The AEN is trained to predict invalid actions, supervised by an external elimination signal provided by the environment. Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in text-based games with over a thousand discrete actions.

artificial intelligence, deep learning, neural network, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > United Kingdom > England (0.14)

Industry:

Leisure & Entertainment > Games (1.00)
Energy > Power Industry (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Zahavy, Tom, Haroush, Matan, Merlis, Nadav, Mankowitz, Daniel J., Mannor, Shie

Neural Information Processing SystemsDec-31-2018

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is sometimes easier to learn which actions not to take. In this work, we propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions. The AEN is trained to predict invalid actions, supervised by an external elimination signal provided by the environment. Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in text-based games with over a thousand discrete actions.

artificial intelligence, deep learning, neural network, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > United Kingdom > England (0.14)

Industry:

Leisure & Entertainment > Games (1.00)
Energy > Power Industry (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Zahavy, Tom, Haroush, Matan, Merlis, Nadav, Mankowitz, Daniel J., Mannor, Shie

arXiv.org Machine LearningSep-6-2018

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is sometimes easier to learn which actions not to take. In this work, we propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions. The AEN is trained to predict invalid actions, supervised by an external elimination signal provided by the environment. Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in text-based games with over a thousand discrete actions.

action elimination, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1809.02121

Country: North America (0.28)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games (1.00)
Energy > Power Industry (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback