AITopics | Matteo Turchetta

Safe Exploration for Interactive Machine Learning

Matteo Turchetta, Felix Berkenkamp, Andreas Krause

Neural Information Processing SystemsMay-31-2025, 16:57:04 GMT

In Interactive Machine Learning (IML), we iteratively make decisions and obtain noisy observations of an unknown function. While IML methods, e.g., Bayesian optimization and active learning, have been successful in applications, on realworld systems they must provably avoid unsafe decisions. To this end, safe IML algorithms must carefully learn about a priori unknown constraints without making unsafe decisions. Existing algorithms for this problem learn about the safety of all decisions to ensure convergence. This is sample-inefficient, as it explores decisions that are not relevant for the original IML objective.

artificial intelligence, constraint, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

NeurIPS22_data_benchmarks

Matteo Turchetta

Neural Information Processing SystemsMay-29-2025, 17:02:42 GMT

This means that shorter time horizons train for more episodes. Regardless of the training setup, we evaluate on the random weather setting. When evaluating trained policies on test-time, test-location and test-horizon we use 20 repetitions. We report the performance on these generalization tasks for the final policy obtained at the end of training.

artificial intelligence, experiment, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Food & Agriculture > Agriculture (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

Safe Exploration for Interactive Machine Learning

Matteo Turchetta, Felix Berkenkamp, Andreas Krause

Neural Information Processing SystemsMar-23-2025, 12:14:49 GMT

In Interactive Machine Learning (IML), we iteratively make decisions and obtain noisy observations of an unknown function. While IML methods, e.g., Bayesian optimization and active learning, have been successful in applications, on realworld systems they must provably avoid unsafe decisions. To this end, safe IML algorithms must carefully learn about a priori unknown constraints without making unsafe decisions. Existing algorithms for this problem learn about the safety of all decisions to ensure convergence. This is sample-inefficient, as it explores decisions that are not relevant for the original IML objective.

artificial intelligence, constraint, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

NeurIPS22_data_benchmarks

Matteo Turchetta

Neural Information Processing SystemsMar-21-2025, 11:23:17 GMT

artificial intelligence, experiment, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Food & Agriculture > Agriculture (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

NeurIPS20_SafeCL

Matteo Turchetta

Neural Information Processing SystemsMar-19-2025, 18:29:30 GMT

In this section, we report the hyperparameters that we use for the students, which are CMDP solvers based on an online version of [30], and for the teachers, which are based on the GP-UCB algorithm for multi-armed bandits [44]. A.1 Students The students comprise two components: an unconstrained RL solver and a no-regret online optimizer. The first component is used to solve the unconstrained RL problem that results from optimizing the Lagrangian of a given CMDP for a fixed value of the Lagrange multipliers. For this, we use the Stable Baselines [25] implementation of the Proximal Policy Optimization (PPO) algorithm [43]. The second component is used to adapt the Lagrangian multipliers online.

artificial intelligence, machine learning, student, (17 more...)

Neural Information Processing Systems

Industry: Education (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

NeurIPS20_SafeCL

Matteo Turchetta

Neural Information Processing SystemsMar-19-2025, 18:29:24 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.46)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

NeurIPS20_SafeCL

Matteo Turchetta

Neural Information Processing SystemsJan-26-2025, 12:23:00 GMT

In this section, we report the hyperparameters that we use for the students, which are CMDP solvers based on an online version of [30], and for the teachers, which are based on the GP-UCB algorithm for multi-armed bandits [44]. A.1 Students The students comprise two components: an unconstrained RL solver and a no-regret online optimizer. The first component is used to solve the unconstrained RL problem that results from optimizing the Lagrangian of a given CMDP for a fixed value of the Lagrange multipliers. For this, we use the Stable Baselines [25] implementation of the Proximal Policy Optimization (PPO) algorithm [43]. The second component is used to adapt the Lagrangian multipliers online.

artificial intelligence, machine learning, student, (17 more...)

Neural Information Processing Systems

Industry: Education (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

NeurIPS20_SafeCL

Matteo Turchetta

Neural Information Processing SystemsJan-26-2025, 12:22:53 GMT

In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly. In such settings, the agent needs to behave safely not only after but also while learning. To achieve this, existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations during exploration with high probability, but both the probabilistic guarantees and the smoothness assumptions inherent in the priors are not viable in many scenarios of interest such as autonomous driving. This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor that saves the agent from violating constraints during learning. In this new model, the instructor needs to know neither how to do well at the task the agent is learning, nor how the environment works. Instead, it has a library of reset controllers that it activates when the agent starts behaving dangerously, preventing it from doing damage. Crucially, the choices of which reset controller to apply in which situation affect the speed of agent learning. Based on observing agents' progress, the teacher itself learns a policy for choosing the reset controllers, a curriculum, to optimize the agent's final policy reward. Our experiments use this framework in two challenging environments to induce curricula for safe and efficient learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.14)

Genre: Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Education (1.00)
Transportation > Ground > Road (0.48)
Leisure & Entertainment > Games > Computer Games (0.46)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

Matteo Turchetta, Felix Berkenkamp, Andreas Krause

Neural Information Processing SystemsJan-20-2025, 16:25:19 GMT

In classical reinforcement learning agents accept arbitrary short term loss for long term gain when exploring their environment. This is infeasible for safety critical applications such as robotics, where even a single unsafe action may cause system failure or harm the environment. In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an a priori unknown safety constraint that depends on states and actions and satisfies certain regularity conditions expressed via a Gaussian process prior.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: