AITopics | airl

Collaborating Authors

airl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Reward Transferability in Adversarial Inverse Reinforcement Learning: Insights from Random Matrix Theory

Zhang, Yangchun, Zhou, Wang, Zhou, Yirui

arXiv.org Machine LearningDec-30-2024

In the context of inverse reinforcement learning (IRL) with a single expert, adversarial inverse reinforcement learning (AIRL) serves as a foundational approach to providing comprehensive and transferable task descriptions. However, AIRL faces practical performance challenges, primarily stemming from the framework's overly idealized decomposability condition, the unclear proof regarding the potential equilibrium in reward recovery, or questionable robustness in high-dimensional environments. This paper revisits AIRL in \textbf{high-dimensional scenarios where the state space tends to infinity}. Specifically, we first establish a necessary and sufficient condition for reward transferability by examining the rank of the matrix derived from subtracting the identity matrix from the transition matrix. Furthermore, leveraging random matrix theory, we analyze the spectral distribution of this matrix, demonstrating that our rank criterion holds with high probability even when the transition matrices are unobservable. This suggests that the limitations on transfer are not inherent to the AIRL framework itself, but are instead related to the training variance of the reinforcement learning algorithms employed within it. Based on this insight, we propose a hybrid framework that integrates on-policy proximal policy optimization in the source environment with off-policy soft actor-critic in the target environment, leading to significant improvements in reward transfer effectiveness.

singular value, source environment, target environment, (14 more...)

arXiv.org Machine Learning

2410.07643

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Maximum likelihood inference for high-dimensional problems with multiaffine variable relations

Brouillon, Jean-Sébastien, Dörfler, Florian, Ferrari-Trecate, Giancarlo

arXiv.org Artificial IntelligenceSep-5-2024

Maximum Likelihood Estimation of continuous variable models can be very challenging in high dimensions, due to potentially complex probability distributions. The existence of multiple interdependencies among variables can make it very difficult to establish convergence guarantees. This leads to a wide use of brute-force methods, such as grid searching and Monte-Carlo sampling and, when applicable, complex and problem-specific algorithms. In this paper, we consider inference problems where the variables are related by multiaffine expressions. We propose a novel Alternating and Iteratively-Reweighted Least Squares (AIRLS) algorithm, and prove its convergence for problems with Generalized Normal Distributions. We also provide an efficient method to compute the variance of the estimates obtained using AIRLS. Finally, we show how the method can be applied to graphical statistical models. We perform numerical experiments on several inference problems, showing significantly better performance than state-of-the-art approaches in terms of scalability, robustness to noise, and convergence speed due to an empirically observed super-linear convergence rate.

airl, algorithm 1, assumption 1, (15 more...)

arXiv.org Artificial Intelligence

2409.03495

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Ohio (0.04)
North America > United States > Michigan (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.84)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Chen, Letian, Jayanthi, Sravan, Paleja, Rohan, Martin, Daniel, Zakharov, Viacheslav, Gombolay, Matthew

arXiv.org Artificial IntelligenceApr-12-2023

Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance.

demonstration, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2209.11908

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.93)
Leisure & Entertainment > Sports > Tennis (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Robust online joint state/input/parameter estimation of linear systems

Brouillon, Jean-Sébastien, Moffat, Keith, Dörfler, Florian, Ferrari-Trecate, Giancarlo

arXiv.org Machine LearningApr-12-2022

This paper presents a method for jointly estimating the state, input, and parameters of linear systems in an online fashion. The method is specially designed for measurements that are corrupted with non-Gaussian noise or outliers, which are commonly found in engineering applications. In particular, it combines recursive, alternating, and iteratively-reweighted least squares into a single, one-step algorithm, which solves the estimation problem online and benefits from the robustness of least-deviation regression methods. The convergence of the iterative method is formally guaranteed. Numerical experiments show the good performance of the estimation algorithm in presence of outliers and in comparison to state-of-the-art methods.

artificial intelligence, estimation, machine learning, (17 more...)

arXiv.org Machine Learning

2204.05663

Country:

North America > United States (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Monaco (0.04)

Genre: Research Report (0.70)

Industry: Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Objective-aware Traffic Simulation via Inverse Reinforcement Learning

Zheng, Guanjie, Liu, Hanyang, Xu, Kai, Li, Zhenhui

arXiv.org Artificial IntelligenceMay-20-2021

Traffic simulators act as an essential component in the operating and planning of transportation systems. Conventional traffic simulators usually employ a calibrated physical car-following model to describe vehicles' behaviors and their interactions with traffic environment. However, there is no universal physical model that can accurately predict the pattern of vehicle's behaviors in different situations. A fixed physical model tends to be less effective in a complicated environment given the non-stationary nature of traffic dynamics. In this paper, we formulate traffic simulation as an inverse reinforcement learning problem, and propose a parameter sharing adversarial inverse reinforcement learning model for dynamics-robust simulation learning. Our proposed model is able to imitate a vehicle's trajectories in the real world while simultaneously recovering the reward function that reveals the vehicle's true objective which is invariant to different dynamics. Extensive experiments on synthetic and real-world datasets show the superior performance of our approach compared to state-of-the-art methods and its robustness to variant dynamics of traffic.

reward function, trajectory, vehicle, (16 more...)

arXiv.org Artificial Intelligence

2105.0956

Country:

Asia > China > Zhejiang Province > Hangzhou (0.05)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > Pennsylvania (0.04)
(3 more...)

Genre: Research Report (0.84)

Industry:

Transportation > Infrastructure & Services (0.89)
Transportation > Ground > Road (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments

Sestini, Alessandro, Kuhnle, Alexander, Bagdanov, Andrew D.

arXiv.org Artificial IntelligenceDec-4-2020

Deep Reinforcement Learning achieves very good results in domains where reward functions can be manually engineered. At the same time, there is growing interest within the community in using games based on Procedurally Content Generation (PCG) as benchmark environments since this type of environment is perfect for studying overfitting and generalization of agents under domain shift. Inverse Reinforcement Learning (IRL) can instead extrapolate reward functions from expert demonstrations, with good results even on high-dimensional problems, however there are no examples of applying these techniques to procedurally-generated environments. This is mostly due to the number of demonstrations needed to find a good reward model. We propose a technique based on Adversarial Inverse Reinforcement Learning which can significantly decrease the need for expert demonstrations in PCG games. Through the use of an environment with a limited set of initial seed levels, plus some modifications to stabilize training, we show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain. We demonstrate the effectiveness of our technique on two procedural environments, MiniGrid and DeepCrawl, for a variety of tasks.

demonstration, procenv, reward function, (17 more...)

arXiv.org Artificial Intelligence

2012.02527

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions

Venuto, David, Chakravorty, Jhelum, Boussioux, Leonard, Wang, Junhao, McCracken, Gavin, Precup, Doina

arXiv.org Machine LearningFeb-20-2020

Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods. While Inverse Reinforcement Learning (IRL) is a solution to recover reward functions from demonstrations only, these learned rewards are generally heavily \textit{entangled} with the dynamics of the environment and therefore not portable or \emph{robust} to changing environments. Modern adversarial methods have yielded some success in reducing reward entanglement in the IRL setting. In this work, we leverage one such method, Adversarial Inverse Reinforcement Learning (AIRL), to propose an algorithm that learns hierarchical disentangled rewards with a policy over options. We show that this method has the ability to learn \emph{generalizable} policies and reward functions in complex transfer learning tasks, while yielding results in continuous control benchmarks that are comparable to those of the state-of-the-art methods.

oirl, reward function, robust adversarial inverse reinforcement learning, (11 more...)

arXiv.org Machine Learning

2002.09043

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adversarial Inverse Reinforcement Learning for Decision Making in Autonomous Driving

Wang, Pin, Liu, Dapeng, Chen, Jiayu, Chan, Ching-Yao

arXiv.org Artificial IntelligenceNov-18-2019

Adversarial Inverse Reinforcement Learning for Decision Making in Autonomous Driving Pin Wang 1, Dapeng Liu 2, 3, Jiayu Chen 4, and Ching-Y ao Chan 1 Abstract -- Generative Adversarial Imitation Learning (GAIL) is an efficient way to learn sequential control strategies from demonstration. Adversarial Inverse Reinforcement Learning (AIRL) is similar to GAIL but also learns a reward function at the same time and has better training stability. In previous work, however, AIRL has mostly been demonstrated on robotic control in artificial environments. In this paper, we apply AIRL to a practical and challenging problem - the decision-making in autonomous driving, and also augment AIRL with a semantic reward to improve its performance. We use four metrics to evaluate its learning performance in a simulated driving environment. Results show that the vehicle agent can learn decent decision-making behaviors from scratch, and can reach a level of performance comparable with that of an expert. Additionally, the comparison with GAIL shows that AIRL converges faster, achieves better and more stable performance than GAIL. I. INTRODUCTION The application of Reinforcement Learning (RL) in robotics has been very fruitful in recent years.

airl, discriminator, reward function, (12 more...)

arXiv.org Artificial Intelligence

1911.08044

Country:

North America > United States > California > Contra Costa County > Richmond (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.91)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Situated GAIL: Multitask imitation using task-conditioned adversarial inverse reinforcement learning

Kobayashi, Kyoichiro, Horii, Takato, Iwaki, Ryo, Nagai, Yukie, Asada, Minoru

arXiv.org Artificial IntelligenceNov-1-2019

Generative adversarial imitation learning (GAIL) has attracted increasing attention in the field of robot learning. It enables robots to learn a policy to achieve a task demonstrated by an expert while simultaneously estimating the reward function behind the expert's behaviors. However, this framework is limited to learning a single task with a single reward function. This study proposes an extended framework called situated GAIL (S-GAIL), in which a task variable is introduced to both the discriminator and generator of the GAIL framework. The task variable has the roles of discriminating different contexts and making the framework learn different reward functions and policies for multiple tasks. To achieve the early convergence of learning and robustness during reward estimation, we introduce a term to adjust the entropy regularization coefficient in the generator's objective function. Our experiments using two setups (navigation in a discrete grid world and arm reaching in a continuous space) demonstrate that the proposed framework can acquire multiple reward functions and policies more effectively than existing frameworks. The task variable enables our framework to differentiate contexts while sharing common knowledge among multiple tasks.

generator, reward function, s-gail, (16 more...)

arXiv.org Artificial Intelligence

1911.00238

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.05)
Asia > Middle East > Jordan (0.04)

Genre:

Instructional Material > Course Syllabus & Notes (0.68)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Adversarial Imitation via Variational Inverse Reinforcement Learning

Qureshi, Ahmed H., Yip, Michael C.

arXiv.org Artificial IntelligenceSep-17-2018

We consider a problem of learning a reward and policy from expert examples under unknown dynamics in high-dimensional scenarios. Our proposed method builds on the framework of generative adversarial networks and exploits reward shaping to learn near-optimal rewards and policies. Potential-based reward shaping functions are known to guide the learning agent whereas in this paper we bring forward their benefits in learning near-optimal rewards. Our method simultaneously learns a potential-based reward shaping function through variational information maximization along with the reward and policy under the adversarial learning formulation. We evaluate our method on various high-dimensional complex control tasks. We also evaluate our learned rewards in transfer learning problems where training and testing environments are made to be different from each other in terms of dynamics or structure. Our experimentation shows that our proposed method not only learns near-optimal rewards and policies matching expert behavior, but also performs significantly better than state-of-the-art inverse reinforcement learning algorithms.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1809.06404

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback