AITopics

2204.04558

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceMay-2-2023

Get Back Here: Robust Imitation by Return-to-Distribution Planning

Cideron, Geoffrey, Tabanpour, Baruch, Curi, Sebastian, Girgin, Sertan, Hussenot, Leonard, Dulac-Arnold, Gabriel, Geist, Matthieu, Pietquin, Olivier, Dadashi, Robert

Imitation Learning (IL) is a paradigm in sequential decision making where an agent uses offline expert trajectories to mimic the expert's behavior [1]. While Reinforcement Learning (RL) requires an additional reward signal that can be hard to specify in practice, IL only requires expert trajectories that can be easier to collect. In part due to its simplicity, IL has been applied successfully in several real world tasks, from robotic manipulation [2, 3, 4] to autonomous driving [5, 6]. A key challenge in deploying IL, however, is that the agent may encounter states in the final deployment environment that were not labeled by the expert offline [7]. In applications such as healthcare [8, 9] and robotics [10, 11], online experimentation can be risky (e.g., on human patients) or costly to label (e.g., off-policy robotic datasets can take months to collect).

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2305.014

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.48)
Automobiles & Trucks (0.48)
Education (0.46)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceFeb-6-2022

Constrained Policy Optimization via Bayesian World Models

As, Yarden, Usmanova, Ilnura, Curi, Sebastian, Krause, Andreas

Improving sample-efficiency and safety are crucial challenges when deploying reinforcement learning in high-stakes real world applications. We propose LAMBDA, a novel model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes. Our approach utilizes Bayesian world models, and harnesses the resulting uncertainty to maximize optimistic upper bounds on the task objective, as well as pessimistic upper bounds on the safety constraints. We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2201.09802

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.61)
(2 more...)

arXiv.org Artificial IntelligenceMar-18-2021

Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning

Curi, Sebastian, Bogunovic, Ilija, Krause, Andreas

In real-world tasks, reinforcement learning (RL) agents frequently encounter situations that are not present during training time. To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations. The robust RL framework addresses this challenge via a worst-case optimization between an agent and an adversary. Previous robust RL algorithms are either sample inefficient, lack robustness guarantees, or do not scale to large problems. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem while attaining near-optimal sample complexity guarantees. RH-UCRL is a model-based reinforcement learning (MBRL) algorithm that effectively distinguishes between epistemic and aleatoric uncertainty and efficiently explores both the agent and adversary decision spaces during policy learning. We scale RH-UCRL to complex tasks via neural networks ensemble models as well as neural network policies. Experimentally, we demonstrate that RH-UCRL outperforms other robust deep RL algorithms in a variety of adversarial environments.

algorithm, artificial intelligence, reinforcement learning, (15 more...)

2103.10369

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceOct-21-2020

Logistic $Q$-Learning

Bas-Serrano, Joan, Curi, Sebastian, Krause, Andreas, Neu, Gergely

While REPS is elegantly derived from a principled We propose a new reinforcement learning algorithm linear-programing (LP) formulation of optimal control derived from a regularized linearprogramming in MDPs, it has the serious shortcoming that its faithful formulation of optimal control implementation requires access to the true MDP in MDPs. The method is closely related to for both the policy evaluation and improvement steps, the classic Relative Entropy Policy Search even at deployment time. The usual way to address (REPS) algorithm of Peters et al. (2010), with this limitation is to use an empirical approximation to the key difference that our method introduces the policy evaluation step and to project the policy a Q-function that enables efficient exact from the improvement step into a parametric space model-free implementation. The main (Deisenroth et al., 2013), losing all the theoretical feature of our algorithm (called Q-REPS) is guarantees of REPS in the process.

algorithm, artificial intelligence, optimization problem, (17 more...)

2010.11151

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningJul-13-2020

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Curi, Sebastian, Berkenkamp, Felix, Krause, Andreas

Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods. This is often attributed to their ability to distinguish between epistemic and aleatoric uncertainty. However, while most algorithms distinguish these two uncertainties for {\em learning} the model, they ignore it when {\em optimizing} the policy. In this paper, we show that ignoring the epistemic uncertainty leads to greedy algorithms that do not explore sufficiently. In turn, we propose a {\em practical optimistic-exploration algorithm} (\alg), which enlarges the input space with {\em hallucinated} inputs that can exert as much control as the {\em epistemic} uncertainty in the model affords. We analyze this setting and construct a general regret bound for well-calibrated models, which is provably sublinear in the case of Gaussian Process models. Based on this theoretical foundation, we show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms and different probabilistic models. Our experiments demonstrate that optimistic exploration significantly speeds up learning when there are penalties on actions, a setting that is notoriously difficult for existing model-based reinforcement learning algorithms.

algorithm, deep learning, upstream oil & gas, (18 more...)

2006.08684

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.93)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningOct-28-2019

Adaptive Sampling for Stochastic Risk-Averse Learning

Curi, Sebastian, Levy, Kfir. Y., Jegelka, Stefanie, Krause, Andreas

We consider the problem of training machine learning models in a risk-averse manner. In particular, we propose an adaptive sampling algorithm for stochastically optimizing the Conditional Value-at-Risk (CVaR) of a loss distribution. We use a distributionally robust formulation of the CVaR to phrase the problem as a zero-sum game between two players. Our approach solves the game using an efficient no-regret algorithm for each player. Critically, we can apply these algorithms to large-scale settings because the implementation relies on sampling from Determinantal Point Processes. Finally, we empirically demonstrate its effectiveness on large-scale convex and non-convex learning tasks.

algorithm, artificial intelligence, optimization problem, (16 more...)

1910.12511

Country: North America (0.28)

Genre: Research Report (0.50)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

arXiv.org Machine LearningJul-16-2019

Structured Variational Inference in Unstable Gaussian Process State Space Models

Melchior, Silvan, Berkenkamp, Felix, Curi, Sebastian, Krause, Andreas

Gaussian processes are expressive, non-parametric statistical models that are well-suited to learn nonlinear dynamical systems. However, large-scale inference in these state space models is a challenging problem. In this paper, we propose CBF-SSM a scalable model that employs a structured variational approximation to maintain temporal correlations. In contrast to prior work, our approach applies to the important class of unstable systems, where state uncertainty grows unbounded over time. For these systems, our method contains a probabilistic, model-based backward pass that infers latent states during training. We demonstrate state-of-the-art performance in our experiments. Moreover, we show that CBF-SSM can be combined with physical models in the form of ordinary differential equations to learn a reliable model of a physical flying robotic vehicle.

artificial intelligence, machine learning, prediction, (18 more...)

1907.07035

Country: Europe (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)

arXiv.org Machine LearningJun-28-2019

Safe Contextual Bayesian Optimization for Sustainable Room Temperature PID Control Tuning

Fiducioso, Marcello, Curi, Sebastian, Schumacher, Benedikt, Gwerder, Markus, Krause, Andreas

We tune one of the most common heating, ventilation, and air conditioning (HVAC) control loops, namely the temperature control of a room. For economical and environmental reasons, it is of prime importance to optimize the performance of this system. Buildings account from 20 to 40% of a country energy consumption, and almost 50% of it comes from HVAC systems. Scenario projections predict a 30% decrease in heating consumption by 2050 due to efficiency increase. Advanced control techniques can improve performance; however, the proportional-integral-derivative (PID) control is typically used due to its simplicity and overall performance. We use Safe Contextual Bayesian Optimization to optimize the PID parameters without human intervention. We reduce costs by 32% compared to the current PID controller setting while assuring safety and comfort to people in the room. The results of this work have an immediate impact on the room control loop performances and its related commissioning costs. Furthermore, this successful attempt paves the way for further use at different levels of HVAC systems, with promising energy, operational, and commissioning costs savings, and it is a practical demonstration of the positive effects that Artificial Intelligence can have on environmental sustainability.

artificial intelligence, bayesian optimization, optimization problem, (19 more...)

1906.12086

Country:

North America > United States (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Industry:

Energy (1.00)
Construction & Engineering > HVAC (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningMar-29-2019

Online Variance Reduction with Mixtures

Borsos, Zalán, Curi, Sebastian, Levy, Kfir Y., Krause, Andreas

Adaptive importance sampling for stochastic optimization is a promising approach that offers improved convergence through variance reduction. In this work, we propose a new framework for variance reduction that enables the use of mixtures over predefined sampling distributions, which can naturally encode prior knowledge about the data. While these sampling distributions are fixed, the mixture weights are adapted during the optimization process. We propose VRM, a novel and efficient adaptive scheme that asymptotically recovers the best mixture weights in hindsight and can also accommodate sampling distributions over sets of points. We empirically demonstrate the versatility of VRM in a range of applications.

artificial intelligence, machine learning, variance reduction, (16 more...)

1903.12416

Country:

Oceania > Australia (0.14)
Europe > Sweden (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)