AITopics | Schäfer, Lukas

Collaborating Authors

Schäfer, Lukas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning

McInroe, Trevor, Schäfer, Lukas, Albrecht, Stefano V.

arXiv.org Artificial IntelligenceJan-29-2024

Learning control from pixels is difficult for reinforcement learning (RL) agents because representation learning and policy learning are intertwined. Previous approaches remedy this issue with auxiliary representation learning tasks, but they either do not consider the temporal aspect of the problem or only consider single-step transitions, which may cause learning inefficiencies if important environmental changes take many steps to manifest. We propose Hierarchical $k$-Step Latent (HKSL), an auxiliary task that learns multiple representations via a hierarchy of forward models that learn to communicate and an ensemble of $n$-step critics that all operate at varying magnitudes of step skipping. We evaluate HKSL in a suite of 30 robotic control tasks with and without distractors and a task of our creation. We find that HKSL either converges to higher or optimal episodic returns more quickly than several alternative representation learning approaches. Furthermore, we find that HKSL's representations capture task-relevant details accurately across timescales (even in the presence of distractors) and that communication channels between hierarchy levels organize information based on both sides of the communication process, both of which improve sample efficiency.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2206.11396

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games

Schäfer, Lukas, Jones, Logan, Kanervisto, Anssi, Cao, Yuhan, Rashid, Tabish, Georgescu, Raluca, Bignell, Dave, Sen, Siddhartha, Gavito, Andrea Treviño, Devlin, Sam

arXiv.org Artificial IntelligenceDec-4-2023

Video games have served as useful benchmarks for the decision making community, but going beyond Atari games towards training agents in modern games has been prohibitively expensive for the vast majority of the research community. Recent progress in the research, development and open release of large vision models has the potential to amortize some of these costs across the community. However, it is currently unclear which of these models have learnt representations that retain information critical for sequential decision making. Towards enabling wider participation in the research of gameplaying agents in modern games, we present a systematic study of imitation learning with publicly available visual encoders compared to the typical, task-specific, end-to-end training approach in Minecraft, Minecraft Dungeons and Counter-Strike: Global Offensive. Figure 1: Representative screenshots of all games studied in this paper. However, video games do not only serve as benchmarks but also represent a vast entertainment industry where AI agents may eventually have applications in games development, including game testing or game design (Jacob et al., 2020; Gillberg et al., 2023). In the past, video game research often necessitated close integration with the games themselves to obtain game-specific information and establish a scalable interface for training agents. Work was conducted during an internship at Microsoft Research. To eliminate integration costs during training, we use behavior cloning to train agents entirely offline, utilising previously collected human gameplay data. Although prior research has explored encoding images into lower-dimensional representations for behavior cloning, these studies primarily targeted robotics applications (Nair et al., 2022), where images often resemble real-world scenes. Inspired by the challenges and potential applications in video games, we investigate the following research question: How can images be encoded for data-efficient imitation learning in modern video games? Towards our guiding research question, we compare both end-to-end trained visual encoders and pre-trained visual encoders in three modern video games: Minecraft, Minecraft Dungeons and Counter-Strike: Global Offensive (CS:GO).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2312.02312

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning

Schäfer, Lukas, Christianos, Filippos, Storkey, Amos, Albrecht, Stefano V.

arXiv.org Artificial IntelligenceNov-20-2023

Successful deployment of multi-agent reinforcement learning often requires agents to adapt their behaviour. In this work, we discuss the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited fine-tuning. Motivated by the intuition that agents need to be able to identify and distinguish tasks in order to adapt their behaviour to the current task, we propose to learn multi-agent task embeddings (MATE). These task embeddings are trained using an encoder-decoder architecture optimised for reconstruction of the transition and reward functions which uniquely identify tasks. We show that a team of agents is able to adapt to novel tasks when provided with task embeddings. We propose three MATE training paradigms: independent MATE, centralised MATE, and mixed MATE which vary in the information used for the task encoding. We show that the embeddings learned by MATE identify tasks and provide useful information which agents leverage during adaptation to novel tasks.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2207.02249

Genre: Research Report > Experimental Study (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking

Krasowski, Hanna, Thumm, Jakob, Müller, Marlon, Schäfer, Lukas, Wang, Xiao, Althoff, Matthias

arXiv.org Artificial IntelligenceNov-18-2023

Ensuring the safety of reinforcement learning (RL) algorithms is crucial to unlock their potential for many real-world tasks. However, vanilla RL and most safe RL approaches do not guarantee safety. In recent years, several methods have been proposed to provide hard safety guarantees for RL, which is essential for applications where unsafe actions could have disastrous consequences. Nevertheless, there is no comprehensive comparison of these provably safe RL methods. Therefore, we introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods. We categorize the methods based on how they adapt the action: action replacement, action projection, and action masking. Our experiments on an inverted pendulum and a quadrotor stabilization task indicate that action replacement is the best-performing approach for these applications despite its comparatively simple realization. Furthermore, adding a reward penalty, every time the safety verification is engaged, improved training performance in our experiments. Finally, we provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.

action replacement, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2205.0675

Country:

Europe > Germany (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.87)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Energy > Oil & Gas (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

Krnjaic, Aleksandar, Steleac, Raul D., Thomas, Jonathan D., Papoudakis, Georgios, Schäfer, Lukas, To, Andrew Wing Keung, Lao, Kuan-Ho, Cubuktepe, Murat, Haley, Matthew, Börsting, Peter, Albrecht, Stefano V.

arXiv.org Artificial IntelligenceJul-7-2023

We envision a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput). Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), as the agents learn through experience how to optimally cooperate with one another. We develop hierarchical MARL algorithms in which a manager assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency and overall pick rates over baseline MARL algorithms in diverse warehouse configurations, and substantially outperform two established industry heuristics for order-picking systems.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2212.11498

Country:

South America > Brazil (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

Andres, Alain, Schäfer, Lukas, Villar-Rodriguez, Esther, Albrecht, Stefano V., Del Ser, Javier

arXiv.org Artificial IntelligenceApr-18-2023

One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently during online RL training both consistently improve the sample-efficiency while converging to optimal policies. Furthermore, we show that pre-training a policy from as few as two trajectories can make the difference between learning an optimal policy at the end of online training and not learning at all. Our findings motivate the widespread adoption of IL for pre-training and concurrent IL in procedurally generated environments whenever offline trajectories are available or can be generated.

machine learning, reinforcement learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2304.09825

Country: Europe > Spain (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Online (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

Schäfer, Lukas, Slumbers, Oliver, McAleer, Stephen, Du, Yali, Albrecht, Stefano V., Mguni, David

arXiv.org Artificial IntelligenceApr-16-2023

Cooperative multi-agent reinforcement learning (MARL) requires agents to explore to learn to cooperate. Existing value-based MARL algorithms commonly rely on random exploration, such as $\epsilon$-greedy, which is inefficient in discovering multi-agent cooperation. Additionally, the environment in MARL appears non-stationary to any individual agent due to the simultaneous training of other agents, leading to highly variant and thus unstable optimisation signals. In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a general framework to extend any value-based MARL algorithm. EMAX trains ensembles of value functions for each agent to address the key challenges of exploration and non-stationarity: (1) The uncertainty of value estimates across the ensemble is used in a UCB policy to guide the exploration of agents to parts of the environment which require cooperation. (2) Average value estimates across the ensemble serve as target values. These targets exhibit lower variance compared to commonly applied target networks and we show that they lead to more stable gradients during the optimisation. We instantiate three value-based MARL algorithms with EMAX, independent DQN, VDN and QMIX, and evaluate them in 21 tasks across four environments. Using ensembles of five value functions, EMAX improves sample efficiency and final evaluation returns of these algorithms by 53%, 36%, and 498%, respectively, averaged all 21 tasks.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2302.03439

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning

McInroe, Trevor, Schäfer, Lukas, Albrecht, Stefano V.

arXiv.org Artificial IntelligenceOct-10-2021

Deep reinforcement learning (RL) agents that exist in high-dimensional state spaces, such as those composed of images, have interconnected learning burdens. Agents must learn an action-selection policy that completes their given task, which requires them to learn a representation of the state space that discerns between useful and useless information. The reward function is the only supervised feedback that RL agents receive, which causes a representation learning bottleneck that can manifest in poor sample efficiency. We present $k$-Step Latent (KSL), a new representation learning method that enforces temporal consistency of representations via a self-supervised auxiliary task wherein agents learn to recurrently predict action-conditioned representations of the state space. The state encoder learned by KSL produces low-dimensional representations that make optimization of the RL task more sample efficient. Altogether, KSL produces state-of-the-art results in both data efficiency and asymptotic performance in the popular PlaNet benchmark suite. Our analyses show that KSL produces encoders that generalize better to new tasks unseen during training, and its representations are more strongly tied to reward, are more invariant to perturbations in the state space, and move more smoothly through the temporal axis of the RL problem than other methods such as DrQ, RAD, CURL, and SAC-AE.

machine learning, reinforcement learning, teaching method, (18 more...)

arXiv.org Artificial Intelligence

2110.04935

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Decoupling Exploration and Exploitation in Reinforcement Learning

Schäfer, Lukas, Christianos, Filippos, Hanna, Josiah, Albrecht, Stefano V.

arXiv.org Artificial IntelligenceJul-22-2021

Intrinsic rewards are commonly applied to improve exploration in reinforcement learning. However, these approaches suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation. DeRL can be applied with on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. We show that DeRL is more robust to scaling and speed of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically motivated baselines in fewer interactions.

artificial intelligence, intrinsic reward, upstream oil & gas, (15 more...)

arXiv.org Artificial Intelligence

2107.08966

Country:

Asia (0.14)
North America > United States > Wisconsin (0.14)

Genre: Research Report (0.63)

Industry: Energy > Oil & Gas > Upstream (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms

Papoudakis, Georgios, Christianos, Filippos, Schäfer, Lukas, Albrecht, Stefano V.

arXiv.org Artificial IntelligenceJun-14-2020

Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we evaluate and compare three different classes of MARL algorithms (independent learners, centralised training with decentralised execution, and value decomposition) in a diverse range of multi-agent learning tasks. Our results show that (1) algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks; (2) independent learners often achieve equal or better performance than more complex algorithms; (3) tested algorithms struggle to solve multi-agent tasks with sparse rewards. We report detailed empirical data, including a reliability analysis, and provide insights into the limitations of the tested algorithms.

agent, artificial intelligence, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2006.07869

Genre: Research Report > New Finding (0.54)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback