AITopics | Parisi, Simone

Collaborating Authors

Parisi, Simone

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Model-Based Exploration in Monitored Markov Decision Processes

Kazemipour, Alireza, Parisi, Simone, Taylor, Matthew E., Bowling, Michael

arXiv.org Artificial IntelligenceFeb-23-2025

A tenet of reinforcement learning is that rewards are always observed by the agent. However, this is not true in many realistic settings, e.g., a human observer may not always be able to provide rewards, a sensor to observe rewards may be limited or broken, or rewards may be unavailable during deployment. Monitored Markov decision processes (Mon-MDPs) have recently been proposed as a model of such settings. Yet, Mon-MDP algorithms developed thus far do not fully exploit the problem structure, cannot take advantage of a known monitor, have no worst-case guarantees for ``unsolvable'' Mon-MDPs without specific initialization, and only have asymptotic proofs of convergence. This paper makes three contributions. First, we introduce a model-based algorithm for Mon-MDPs that addresses all of these shortcomings. The algorithm uses two instances of model-based interval estimation, one to guarantee that observable rewards are indeed observed, and another to learn the optimal policy. Second, empirical results demonstrate these advantages, showing faster convergence than prior algorithms in over two dozen benchmark settings, and even more dramatic improvements when the monitor process is known. Third, we present the first finite-sample bound on performance and show convergence to an optimal worst-case policy when some rewards are never observable.

artificial intelligence, machine learning, mon-mdp, (15 more...)

arXiv.org Artificial Intelligence

2502.16772

Country: North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Monitored Markov Decision Processes

Parisi, Simone, Mohammedalamen, Montaser, Kazemipour, Alireza, Taylor, Matthew E., Bowling, Michael

arXiv.org Artificial IntelligenceFeb-9-2024

In reinforcement learning (RL), an agent learns to perform a task by interacting with an environment and receiving feedback (a numerical reward) for its actions. However, the assumption that rewards are always observable is often not applicable in real-world problems. For example, the agent may need to ask a human to supervise its actions or activate a monitoring system to receive feedback. There may even be a period of time before rewards become observable, or a period of time after which rewards are no longer given. In other words, there are cases where the environment generates rewards in response to the agent's actions but the agent cannot observe them. In this paper, we formalize a novel but general RL framework - Monitored MDPs - where the agent cannot always observe rewards. We discuss the theoretical and practical consequences of this setting, show challenges raised even in toy environments, and propose algorithms to begin to tackle this novel setting. This paper introduces a powerful new formalism that encompasses both new and existing problems and lays the foundation for future research.

machine learning, mon-mdp, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2402.06819

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)

Add feedback

TD-Regularized Actor-Critic Methods

Parisi, Simone, Tangkaratt, Voot, Peters, Jan, Khan, Mohammad Emtiyaz

arXiv.org Machine LearningDec-23-2018

Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objective of the actor by penalizing the temporal difference (TD) error of the critic. This improves stability by avoiding large steps in the actor update whenever the critic is highly inaccurate. The resulting method, which we call the TD-regularized actor-critic method, is a simple plug-and-play approach to improve stability and overall performance of the actor-critic methods. Evaluations on standard benchmarks confirm this.

artificial intelligence, optimization problem, td error, (18 more...)

arXiv.org Machine Learning

1812.08288

Country:

Europe > Germany (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Policy Search with High-Dimensional Context Variables

Tangkaratt, Voot (The University of Tokyo) | Hoof, Herke van (McGill University) | Parisi, Simone (Technical University of Darmstadt) | Neumann, Gerhard (University of Lincoln) | Peters, Jan (Max Planck Institute for Intelligent Systems) | Sugiyama, Masashi (The University of Tokyo)

AAAI ConferencesFeb-14-2017

Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored. In this paper, we propose a contextual policy search method in the model-based relative entropy stochastic search framework with integrated dimensionality reduction. We learn a model of the reward that is locally quadratic in both the policy parameters and the context variables. Furthermore, we perform supervised linear dimensionality reduction on the context variables by nuclear norm regularization. The experimental results show that the proposed method outperforms naive dimensionality reduction via principal component analysis and a state-of-the-art contextual policy search method.

artificial intelligence, dimensionality reduction, machine learning, (17 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America (0.93)
Europe > Germany (0.30)
Asia > Japan > Honshū > Kantō (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.45)

Add feedback

Policy Search with High-Dimensional Context Variables

Tangkaratt, Voot, van Hoof, Herke, Parisi, Simone, Neumann, Gerhard, Peters, Jan, Sugiyama, Masashi

arXiv.org Machine LearningNov-10-2016

deep learning, dimensionality reduction, neural network, (17 more...)

arXiv.org Machine Learning

1611.03231

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.45)

Add feedback

Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation

Pirotta, Matteo (Politecnico di Milano) | Parisi, Simone (Politecnico di Milano) | Restelli, Marcello (Politecnico di Milano)

AAAI ConferencesMar-6-2015

This paper is about learning a continuous approximation of the Pareto frontier in Multi-Objective Markov Decision Problems (MOMDPs).We propose a policy-based approach that exploits gradient information to generate solutions close to the Pareto ones.Differently from previous policy-gradient multi-objective algorithms, where n optimization routines are used to have n solutions, our approach performs a single gradient-ascent run that at each step generates an improved continuous approximation of the Pareto frontier.The idea is to exploit a gradient-based approach to optimize the parameters of a function that defines a manifold in the policy parameter space so that the corresponding image in the objective space gets as close as possible to the Pareto frontier.Besides deriving how to compute and estimate such gradient, we will also discuss the non-trivial issue of defining a metric to assess the quality of the candidate Pareto frontiers.Finally, the properties of the proposed approach are empirically evaluated on two interesting MOMDPs.

artificial intelligence, optimization problem, pareto frontier, (19 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > New York (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback