AITopics | Heim, Steve

Collaborating Authors

Heim, Steve

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FLD: Fourier Latent Dynamics for Structured Motion Representation and Learning

Li, Chenhao, Stanger-Jones, Elijah, Heim, Steve, Kim, Sangbae

arXiv.org Artificial IntelligenceFeb-21-2024

Motion trajectories offer reliable references for physics-based motion learning but suffer from sparsity, particularly in regions that lack sufficient data coverage. To address this challenge, we introduce a self-supervised, structured representation and generation method that extracts spatial-temporal relationships in periodic or quasi-periodic motions. The motion dynamics in a continuously parameterized latent space enable our method to enhance the interpolation and generalization capabilities of motion learning algorithms. The motion learning controller, informed by the motion parameterization, operates online tracking of a wide range of motions, including targets unseen during training. With a fallback mechanism, the controller dynamically adapts its tracking strategy and automatically resorts to safe action execution when a potentially risky target is proposed. By leveraging the identified spatial-temporal structure, our work opens new possibilities for future advancements in general motion representation and learning algorithms. The availability of reference trajectories, such as motion capture data, has significantly propelled the advancement of motion learning techniques (Peng et al., 2018; Bergamin et al., 2019; Peng et al., 2021; 2022; Starke et al., 2022; Li et al., 2023b;a). However, it is difficult to generalize policies using these techniques to motions outside the distribution of the available data (Peng et al., 2020; Li et al., 2023a). A core reason is that, while the trajectories in the data itself are induced by some dynamics of the system, the learned policies are typically trained to only replicate the data, instead of understanding the underlying dynamics structure. In other words, the policies attempt to memorize the trajectory instances rather than learn to predict them systematically. Moreover, the high nonlinearity and the embedded high-level similarity hinder datadriven methods from effectively identifying and modeling the dynamics of motion patterns (Peng et al., 2018). Therefore, addressing these challenges requires systematic understanding and leveraging the structured nature of the motion space. Instead of handling raw motion trajectories in long-horizon, high-dimensional state space, structured representation methods introduce certain inductive biases during training and offer an efficient approach to managing complex movements (Min & Chai, 2012; Lee et al., 2021). These methods focus on extracting the essential features and temporal dependencies of motions, enabling more effective and compact representations (Lee et al., 2010; Levine et al., 2012). The ability to understand and capture the spatial-temporal structure of the motion space offers enhanced interpolation and generalization capabilities that can augment training datasets and improve the effectiveness of motion generation algorithms (Holden et al., 2017; Iscen et al., 2018; Ibarz et al., 2021).

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2402.1382

Country:

North America > United States > Massachusetts (0.14)
Asia (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Learning Emergent Gaits with Decentralized Phase Oscillators: on the role of Observations, Rewards, and Feedback

Zhang, Jenny, Heim, Steve, Jeon, Se Hwan, Kim, Sangbae

arXiv.org Artificial IntelligenceFeb-17-2024

We present a minimal phase oscillator model for learning quadrupedal locomotion. Each of the four oscillators is coupled only to itself and its corresponding leg through local feedback of the ground reaction force, which can be interpreted as an observer feedback gain. We interpret the oscillator itself as a latent contact state-estimator. Through a systematic ablation study, we show that the combination of phase observations, simple phase-based rewards, and the local feedback dynamics induces policies that exhibit emergent gait preferences, while using a reduced set of simple rewards, and without prescribing a specific gait. The code is open-source, and a video synopsis available at https://youtu.be/1NKQ0rSV3jU.

artificial intelligence, machine learning, oscillator, (19 more...)

arXiv.org Artificial Intelligence

2402.08662

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

Jeon, Se Hwan, Heim, Steve, Khazoom, Charles, Kim, Sangbae

arXiv.org Artificial IntelligenceJul-19-2023

The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions. Well-designed shaping reward can lead to significantly faster learning. Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned. In theory, the broad class of potential based reward shaping (PBRS) can help guide the learning process without affecting the optimal policy. Although several studies have explored the use of potential based reward shaping to accelerate learning convergence, most have been limited to grid-worlds and low-dimensional systems, and RL in robotics has predominantly relied on standard forms of reward shaping. In this paper, we benchmark standard forms of shaping with PBRS for a humanoid robot. We find that in this high-dimensional system, PBRS has only marginal benefits in convergence speed. However, the PBRS reward terms are significantly more robust to scaling than typical reward shaping approaches, and thus easier to tune.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICRA48891.2023.10160885

2307.10142

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Safe Value Functions

Massiani, Pierre-François, Heim, Steve, Solowjow, Friedrich, Trimpe, Sebastian

arXiv.org Artificial IntelligenceDec-1-2022

Safety constraints and optimality are important, but sometimes conflicting criteria for controllers. Although these criteria are often solved separately with different tools to maintain formal guarantees, it is also common practice in reinforcement learning to simply modify reward functions by penalizing failures, with the penalty treated as a mere heuristic. We rigorously examine the relationship of both safety and optimality to penalties, and formalize sufficient conditions for safe value functions (SVFs): value functions that are both optimal for a given task, and enforce safety constraints. We reveal this structure by examining when rewards preserve viability under optimal control, and show that there always exists a finite penalty that induces a safe value function. This penalty is not unique, but upper-unbounded: larger penalties do not harm optimality. Although it is often not possible to compute the minimum required penalty, we reveal clear structure of how the penalty, rewards, discount factor, and dynamics interact. This insight suggests practical, theory-guided heuristics to design reward functions for control problems where safety is important.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TAC.2022.3200948

2105.12204

Country: Europe > Germany (0.68)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

A Learnable Safety Measure

Heim, Steve, von Rohr, Alexander, Trimpe, Sebastian, Badri-Spröwitz, Alexander

arXiv.org Artificial IntelligenceOct-7-2019

Failures are challenging for learning to control physical systems since they risk damage, time-consuming resets, and often provide little gradient information. Adding safety constraints to exploration typically requires a lot of prior knowledge and domain expertise. We present a safety measure which implicitly captures how the system dynamics relate to a set of failure states. Not only can this measure be used as a safety function, but also to directly compute the set of safe state-action pairs. Further, we show a model-free approach to learn this measure by active sampling using Gaussian processes. While safety can only be guaranteed after learning the safety measure, we show that failures can already be greatly reduced by using the estimated measure during learning.

artificial intelligence, machine learning, safety measure, (15 more...)

arXiv.org Artificial Intelligence

1910.02835

Country: Asia > Japan (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Learning from Outside the Viability Kernel: Why we Should Build Robots that can Fall with Grace

Heim, Steve, Spröwitz, Alexander

arXiv.org Artificial IntelligenceJun-18-2018

Despite impressive results using reinforcement learning to solve complex problems from scratch, in robotics this has still been largely limited to model-based learning with very informative reward functions. One of the major challenges is that the reward landscape often has large patches with no gradient, making it difficult to sample gradients effectively. We show here that the robot state-initialization can have a more important effect on the reward landscape than is generally expected. In particular, we show the counter-intuitive benefit of including initializations that are unviable, in other words initializing in states that are doomed to fail.

artificial intelligence, initialization, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/SIMPAR.2018.8376271

1806.06569

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.47)

Add feedback