AITopics | Metelli, Alberto Maria

Collaborating Authors

Metelli, Alberto Maria

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Simultaneously Updating All Persistence Values in Reinforcement Learning

Sabbioni, Luca, Daire, Luca Al, Bisi, Lorenzo, Metelli, Alberto Maria, Restelli, Marcello

arXiv.org Artificial IntelligenceNov-21-2022

In reinforcement learning, the performance of learning agents is highly sensitive to the choice of time discretization. Agents acting at high frequencies have the best control opportunities, along with some drawbacks, such as possible inefficient exploration and vanishing of the action advantages. The repetition of the actions, i.e., action persistence, comes into help, as it allows the agent to visit wider regions of the state space and improve the estimation of the action effects. In this work, we derive a novel All-Persistence Bellman Operator, which allows an effective use of both the low-persistence experience, by decomposition into sub-transition, and the high-persistence experience, thanks to the introduction of a suitable bootstrap procedure. In this way, we employ transitions collected at any time scale to update simultaneously the action values of the considered persistence set. We prove the contraction property of the All-Persistence Bellman Operator and, based on it, we extend classic Q-learning and DQN. After providing a study on the effects of persistence, we experimentally evaluate our approach in both tabular contexts and more challenging frameworks, including some Atari games.

machine learning, persistence, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2211.1162

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Storehouse: a Reinforcement Learning Environment for Optimizing Warehouse Management

Cestero, Julen, Quartulli, Marco, Metelli, Alberto Maria, Restelli, Marcello

arXiv.org Artificial IntelligenceJul-21-2022

Warehouse Management Systems have been evolving and improving thanks to new Data Intelligence techniques. However, many current optimizations have been applied to specific cases or are in great need of manual interaction. Here is where Reinforcement Learning techniques come into play, providing automatization and adaptability to current optimization policies. In this paper, we present Storehouse, a customizable environment that generalizes the definition of warehouse simulations for Reinforcement Learning. We also validate this environment against state-of-the-art reinforcement learning algorithms and compare these results to human and random policies.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2207.03851

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry: Education (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Optimization as Online Learning with Mediator Feedback

Metelli, Alberto Maria, Papini, Matteo, D'Oro, Pierluca, Restelli, Marcello

arXiv.org Machine LearningDec-15-2020

Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space. The additional available information, compared to the standard bandit feedback, allows reusing samples generated by one policy to estimate the performance of other policies. Based on this observation, we propose an algorithm, RANDomized-exploration policy Optimization via Multiple Importance Sampling with Truncation (RANDOMIST), for regret minimization in PO, that employs a randomized exploration strategy, differently from the existing optimistic approaches. When the policy space is finite, we show that under certain circumstances, it is possible to achieve constant regret, while always enjoying logarithmic regret. We also derive problem-dependent regret lower bounds. Then, we extend RANDOMIST to compact policy spaces. Finally, we provide numerical simulations on finite and compact policy spaces, in comparison with PO and bandit baselines.

computer based training, educational technology, mediator feedback, (23 more...)

arXiv.org Machine Learning

2012.08225

Country: Europe > Italy (0.14)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.60)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.45)

Add feedback

Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters

Metelli, Alberto Maria, Likmeta, Amarildo, Restelli, Marcello

Neural Information Processing SystemsMar-18-2020, 22:16:05 GMT

How does the uncertainty of the value function propagate when performing temporal difference learning? In this paper, we address this question by proposing a Bayesian framework in which we employ approximate posterior distributions to model the uncertainty of the value function and Wasserstein barycenters to propagate it across state-action pairs. Leveraging on these tools, we present an algorithm, Wasserstein Q-Learning (WQL), starting in the tabular case and then, we show how it can be extended to deal with continuous domains. Furthermore, we prove that, under mild assumptions, a slight variation of WQL enjoys desirable theoretical properties in the tabular setting. Finally, we present an experimental campaign to show the effectiveness of WQL on finite problems, compared to several RL algorithms, some of which are specifically designed for exploration, along with some preliminary results on Atari games.

artificial intelligence, computer game, wasserstein barycenter, (4 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Gradient-Aware Model-based Policy Search

D'Oro, Pierluca, Metelli, Alberto Maria, Tirinzoni, Andrea, Papini, Matteo, Restelli, Marcello

arXiv.org Artificial IntelligenceSep-9-2019

Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Model-based Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with the collected trajectories, to compute the new policy parameters. Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties.

artificial intelligence, null, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1909.04115

Country:

Europe (0.46)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Sports > Golf (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Space Identification in Configurable Environments

Metelli, Alberto Maria, Manneschi, Guglielmo, Restelli, Marcello

arXiv.org Artificial IntelligenceSep-9-2019

We study the problem of identifying the policy space of a learning agent, having access to a set of demonstrations generated by its optimal policy. We introduce an approach based on statistical testing to identify the set of policy parameters the agent can control, within a larger parametric policy space. After presenting two identification rules (combinatorial and simplified), applicable under different assumptions on the policy space, we provide a probabilistic analysis of the simplified one in the case of linear policies belonging to the exponential family. To improve the performance of our identification rules, we frame the problem in the recently introduced framework of the Configurable Markov Decision Processes, exploiting the opportunity of configuring the environment to induce the agent revealing which parameters it can control. Finally, we provide an empirical evaluation, on both discrete and continuous domains, to prove the effectiveness of our identification rules.

agent, artificial intelligence, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1909.03984

Country:

Europe (1.00)
North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Feature Selection via Mutual Information: New Theoretical Insights

Beraha, Mario, Metelli, Alberto Maria, Papini, Matteo, Tirinzoni, Andrea, Restelli, Marcello

arXiv.org Machine LearningJul-17-2019

Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables. However, existing algorithms are mostly heuristic and do not offer any guarantee on the proposed solution. In this paper, we provide novel theoretical results showing that conditional mutual information naturally arises when bounding the ideal regression/classification errors achieved by different subsets of features. Leveraging on these insights, we propose a novel stopping condition for backward and forward greedy methods which ensures that the ideal prediction error using the selected feature subset remains bounded by a user-specified threshold. We provide numerical simulations to support our theoretical claims and compare to common heuristic methods.

artificial intelligence, feature selection, machine learning, (19 more...)

arXiv.org Machine Learning

1907.07384

Country: Europe > Italy (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Policy Optimization via Importance Sampling

Metelli, Alberto Maria, Papini, Matteo, Faccio, Francesco, Restelli, Marcello

Neural Information Processing SystemsDec-31-2018

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel, model-free, policy search algorithm, POIS, applicable in both action-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation; then we define a surrogate objective function, which is optimized offline whenever a new batch of trajectories is collected. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.

artificial intelligence, optimization, optimization problem, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.28)
Europe > Italy > Lombardy (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Policy Optimization via Importance Sampling

Metelli, Alberto Maria, Papini, Matteo, Faccio, Francesco, Restelli, Marcello

Neural Information Processing SystemsDec-31-2018

artificial intelligence, optimization, optimization problem, (19 more...)

Neural Information Processing Systems

Country: Europe > Italy > Lombardy (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Policy Optimization via Importance Sampling

Metelli, Alberto Maria, Papini, Matteo, Faccio, Francesco, Restelli, Marcello

arXiv.org Machine LearningSep-17-2018

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating on-line and off-line optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel model-free policy search algorithm, POIS, applicable in both control-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation and then we define a surrogate objective function which is optimized off-line using a batch of trajectories. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with the state-of-the-art policy optimization methods.

neural network, optimization problem, variance, (20 more...)

arXiv.org Machine Learning

1809.06098

Country: Europe (0.46)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback