AITopics

2007.00722

Country:

North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)

Genre:

Research Report (1.00)
Workflow (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningMay-23-2020

A Novel Confidence-Based Algorithm for Structured Bandits

Tirinzoni, Andrea, Lazaric, Alessandro, Restelli, Marcello

We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem and rapidly discard all sub-optimal arms. In particular, unlike standard bandit algorithms with no structure, we show that the number of times a suboptimal arm is selected may actually be reduced thanks to the information collected by pulling other arms. Furthermore, we show that, in some structures, the regret of an anytime extension of our algorithm is uniformly bounded over time. For these constant-regret structures, we also derive a matching lower bound. Finally, we demonstrate numerically that our approach better exploits certain structures than existing methods.

algorithm, artificial intelligence, big data, (16 more...)

2005.11593

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)

arXiv.org Artificial IntelligenceSep-9-2019

Gradient-Aware Model-based Policy Search

D'Oro, Pierluca, Metelli, Alberto Maria, Tirinzoni, Andrea, Papini, Matteo, Restelli, Marcello

Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Model-based Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with the collected trajectories, to compute the new policy parameters. Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties.

artificial intelligence, null, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1909.04115

Country:

Europe (0.46)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Sports > Golf (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningJul-17-2019

Feature Selection via Mutual Information: New Theoretical Insights

Beraha, Mario, Metelli, Alberto Maria, Papini, Matteo, Tirinzoni, Andrea, Restelli, Marcello

Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables. However, existing algorithms are mostly heuristic and do not offer any guarantee on the proposed solution. In this paper, we provide novel theoretical results showing that conditional mutual information naturally arises when bounding the ideal regression/classification errors achieved by different subsets of features. Leveraging on these insights, we propose a novel stopping condition for backward and forward greedy methods which ensures that the ideal prediction error using the selected feature subset remains bounded by a user-specified threshold. We provide numerical simulations to support our theoretical claims and compare to common heuristic methods.

artificial intelligence, feature selection, machine learning, (19 more...)

1907.07384

Country: Europe > Italy (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Transfer of Value Functions via Variational Methods

Tirinzoni, Andrea, Sanchez, Rafael Rodriguez, Restelli, Marcello

We consider the problem of transferring value functions in reinforcement learning. We propose an approach that uses the given source tasks to learn a prior distribution over optimal value functions and provide to an efficient variational approximation of the corresponding posterior in a new target task. We show our approach to be general, in the sense that it can be combined with complex parametric function approximators and distribution models, while providing two practical algorithms based on Gaussians and Gaussian mixtures. We theoretically analyze them by deriving a finite-sample analysis and provide a comprehensive empirical evaluation in four different domains.

data mining, machine learning, reinforcement learning, (17 more...)

Country:

North America > United States (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Data Science > Data Mining (0.93)

Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes

Tirinzoni, Andrea, Petrik, Marek, Chen, Xiangli, Ziebart, Brian

What policy should be employed in a Markov decision process with uncertain parameters? Robust optimization answer to this question is to use rectangular uncertainty sets, which independently reflect available knowledge about each state, and then obtains a decision policy that maximizes expected reward for the worst-case decision process parameters from these uncertainty sets. While this rectangularity is convenient computationally and leads to tractable solutions, it often produces policies that are too conservative in practice, and does not facilitate knowledge transfer between portions of the state space or across related decision processes. In this work, we propose non-rectangular uncertainty sets that bound marginal moments of state-action features defined over entire trajectories through a decision process. This enables generalization to different portions of the state space while retaining appropriate uncertainty of the decision process. We develop algorithms for solving the resulting robust decision problems, which reduce to finding an optimal policy for a mixture of decision processes, and demonstrate the benefits of our approach experimentally.

constraint, decision support system, machine learning, (20 more...)

Country:

North America > United States > Illinois (0.14)
Europe > United Kingdom > Scotland (0.14)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
(2 more...)

Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes

Tirinzoni, Andrea, Petrik, Marek, Chen, Xiangli, Ziebart, Brian

What policy should be employed in a Markov decision process with uncertain parameters? Robust optimization's answer to this question is to use rectangular uncertainty sets, which independently reflect available knowledge about each state, and then to obtain a decision policy that maximizes the expected reward for the worst-case decision process parameters from these uncertainty sets. While this rectangularity is convenient computationally and leads to tractable solutions, it often produces policies that are too conservative in practice, and does not facilitate knowledge transfer between portions of the state space or across related decision processes. In this work, we propose non-rectangular uncertainty sets that bound marginal moments of state-action features defined over entire trajectories through a decision process. This enables generalization to different portions of the state space while retaining appropriate uncertainty of the decision process. We develop algorithms for solving the resulting robust decision problems, which reduce to finding an optimal policy for a mixture of decision processes, and demonstrate the benefits of our approach experimentally.

artificial intelligence, constraint, reinforcement learning, (18 more...)

Country: North America > United States > Illinois (0.14)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
(2 more...)

Transfer of Value Functions via Variational Methods

Tirinzoni, Andrea, Sanchez, Rafael Rodriguez, Restelli, Marcello

We consider the problem of transferring value functions in reinforcement learning. We propose an approach that uses the given source tasks to learn a prior distribution over optimal value functions and provide an efficient variational approximation of the corresponding posterior in a new target task. We show our approach to be general, in the sense that it can be combined with complex parametric function approximators and distribution models, while providing two practical algorithms based on Gaussians and Gaussian mixtures. We theoretically analyze them by deriving a finite-sample analysis and provide a comprehensive empirical evaluation in four different domains.

algorithm, artificial intelligence, bayesian inference, (18 more...)

Country:

North America > United States (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Data Science > Data Mining (0.93)

arXiv.org Machine LearningMay-28-2018

Importance Weighted Transfer of Samples in Reinforcement Learning

Tirinzoni, Andrea, Sessa, Andrea, Pirotta, Matteo, Restelli, Marcello

We consider the transfer of experience samples (i.e., tuples < s, a, s', r >) in reinforcement learning (RL), collected from a set of source tasks to improve the learning process in a given target task. Most of the related approaches focus on selecting the most relevant source samples for solving the target task, but then all the transferred samples are used without considering anymore the discrepancies between the task models. In this paper, we propose a model-based technique that automatically estimates the relevance (importance weight) of each source sample for solving the target task. In the proposed approach, all the samples are transferred and used by a batch RL algorithm to solve the target task, but their contribution to the learning process is proportional to their importance weight. By extending the results for importance weighting provided in supervised learning literature, we develop a finite-sample analysis of the proposed batch RL algorithm. Furthermore, we empirically compare the proposed algorithm to state-of-the-art approaches, showing that it achieves better learning performance and is very robust to negative transfer, even when some source tasks are significantly different from the target task.

artificial intelligence, reinforcement learning, upstream oil & gas, (17 more...)

1805.10886

Country:

North America > United States (0.14)
North America > Canada > Alberta (0.14)
Europe > Sweden (0.14)
(2 more...)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry:

Energy > Oil & Gas > Upstream (0.46)
Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)