AITopics | Buchli, Jonas

Collaborating Authors

Buchli, Jonas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Preference Optimization as Probabilistic Inference

Abdolmaleki, Abbas, Piot, Bilal, Shahriari, Bobak, Springenberg, Jost Tobias, Hertweck, Tim, Joshi, Rishabh, Oh, Junhyuk, Bloesch, Michael, Lampe, Thomas, Heess, Nicolas, Buchli, Jonas, Riedmiller, Martin

arXiv.org Machine LearningOct-5-2024

The use of preference annotated data for training machine learning models has a long history going back to early algorithms for recommender systems and market research (Bonilla et al., 2010; Boutilier, 2002; Guo and Sanner, 2010). These days preference optimization algorithms are receiving renewed attention since they are a natural candidate for shaping the outputs of deep learning systems, such as large language models (Ouyang et al., 2022; Team et al., 2024) or control policies, via human feedback (Azar et al., 2023; Christiano et al., 2017; Rafailov et al., 2023). Arguably, preference optimization algorithms can also be a natural choice even when direct human feedback is not available but one instead aims to optimize a machine learning model based on feedback from a hand-coded or learned critic function (judging desirability of solutions). Here preference optimization methods are useful since they let us optimize the model to achieve desired outcomes based on relative rankings between outcomes alone (rather than requiring absolute labels or carefully crafted reward functions). Among preference optimization approaches, those based on directly using preference data - as opposed to casting preference optimization as reinforcement learning from (human) feedback - such as DPO (Rafailov et al., 2023), have emerged as particularly successful since they only require access to an offline dataset of paired preference data, and are fairly robust to application domain and hyperparameter settings. However, algorithms within this class make specific assumptions tailored to their application domain. They were designed to optimize LLMs from human feedback in the form of comparisons of generated sentences and thus, by design, require paired preference data (since they directly model a specific choice of preference distribution). We are interested in finding algorithms that are more flexible, and applicable in settings where the assumptions underlying DPO do not apply.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

2410.04166

Country:

Europe > Italy (0.14)
Asia > Middle East (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning

Bhardwaj, Mohak, Lampe, Thomas, Neunert, Michael, Romano, Francesco, Abdolmaleki, Abbas, Byravan, Arunkumar, Wulfmeier, Markus, Riedmiller, Martin, Buchli, Jonas

arXiv.org Artificial IntelligenceFeb-8-2024

Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale. However, domains such as fluid dynamical systems exhibit complex dynamic phenomena that are hard to simulate at high integration rates, limiting the direct application of modern deep RL algorithms to often expensive or safety critical hardware. In this work, we introduce "Box o Flows", a novel benchtop experimental control system for systematically evaluating RL algorithms in dynamic real-world scenarios. We describe the key components of the Box o Flows, and through a series of experiments demonstrate how state-of-the-art model-free RL algorithms can synthesize a variety of complex behaviors via simple reward specifications. Furthermore, we explore the role of offline RL in data-efficient hypothesis testing by reusing past experiences. We believe that the insights gained from this preliminary study and the availability of systems like the Box o Flows support the way forward for developing systematic RL algorithms that can be generally applied to complex, dynamical systems. Supplementary material and videos of experiments are available at https://sites.google.com/view/box-o-flows/home.

fluid directed rigid body control, machine learning, reinforcement learning, (9 more...)

arXiv.org Artificial Intelligence

2402.06102

Country: Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Towards practical reinforcement learning for tokamak magnetic control

Tracey, Brendan D., Michi, Andrea, Chervonyi, Yuri, Davies, Ian, Paduraru, Cosmin, Lazic, Nevena, Felici, Federico, Ewalds, Timo, Donner, Craig, Galperti, Cristian, Buchli, Jonas, Neunert, Michael, Huber, Andrea, Evens, Jonathan, Kurylowicz, Paula, Mankowitz, Daniel J., Riedmiller, Martin, Team, The TCV

arXiv.org Artificial IntelligenceOct-5-2023

Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the steady-state error, and decreasing the required time to learn new tasks. We build on top of \cite{degrave2022magnetic}, and present algorithmic improvements to the agent architecture and training procedure. We present simulation results that show up to 65\% improvement in shape accuracy, achieve substantial reduction in the long-term bias of the plasma current, and additionally reduce the training time required to learn new tasks by a factor of 3 or more. We present new experiments using the upgraded RL-based controllers on the TCV tokamak, which validate the simulation results achieved, and point the way towards routinely achieving accurate discharges using the RL approach.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2307.11546

Country:

North America > United States (0.14)
Europe > Switzerland (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.67)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Shaking the foundations: delusions in sequence models for interaction and control

Ortega, Pedro A., Kunesch, Markus, Delétang, Grégoire, Genewein, Tim, Grau-Moya, Jordi, Veness, Joel, Buchli, Jonas, Degrave, Jonas, Piot, Bilal, Perolat, Julien, Everitt, Tom, Tallec, Corentin, Parisotto, Emilio, Erez, Tom, Chen, Yutian, Reed, Scott, Hutter, Marcus, de Freitas, Nando, Legg, Shane

arXiv.org Artificial IntelligenceOct-20-2021

The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models "lack the understanding of the cause and effect of their actions" leading them to draw incorrect inferences due to auto-suggestive delusions. In this report we explain where this mismatch originates, and show that it can be resolved by treating actions as causal interventions. Finally, we show that in supervised learning, one can teach a system to condition or intervene on data by training with factual and counterfactual error signals respectively.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2110.10819

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Local Search for Policy Iteration in Continuous Control

Springenberg, Jost Tobias, Heess, Nicolas, Mankowitz, Daniel, Merel, Josh, Byravan, Arunkumar, Abdolmaleki, Abbas, Kay, Jackie, Degrave, Jonas, Schrittwieser, Julian, Tassa, Yuval, Buchli, Jonas, Belov, Dan, Riedmiller, Martin

arXiv.org Artificial IntelligenceOct-12-2020

We present an algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework. Our algorithm can be interpreted as a natural extension of work on KL-regularized RL and introduces a form of tree search for continuous action spaces. We demonstrate that additional computation spent on model-based policy improvement during learning can improve data efficiency, and confirm that model-based policy improvement during action selection can also be beneficial. Quantitatively, our algorithm improves data efficiency on several continuous control benchmarks (when a model is learned in parallel), and it provides significant improvements in wall-clock time in high-dimensional domains (when a ground truth model is available). The unified framework also helps us to better understand the space of model-based and model-free algorithms. In particular, we demonstrate that some benefits attributed to model-based RL can be obtained without a model, simply by utilizing more computation.

algorithm, artificial intelligence, survey article, (17 more...)

arXiv.org Artificial Intelligence

2010.05545

Country:

Europe > United Kingdom > England (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback