Agents
On the Detection of Mutual Influences and Their Consideration in Reinforcement Learning Processes
Rudolph, Stefan, Tomforde, Sven, Hähner, Jörg
Self-adaptation has been proposed as a mechanism to counter complexity in control problems of technical systems. A major driver behind self-adaptation is the idea to transfer traditional design-time decisions to runtime and into the responsibility of systems themselves. In order to deal with unforeseen events and conditions, systems need creativity -- typically realized by means of machine learning capabilities. Such learning mechanisms are based on different sources of knowledge. Feedback from the environment used for reinforcement purposes is probably the most prominent one within the self-adapting and self-organizing (SASO) systems community. However, the impact of other (sub-)systems on the success of the individual system's learning performance has mostly been neglected in this context. In this article, we propose a novel methodology to identify effects of actions performed by other systems in a shared environment on the utility achievement of an autonomous system. Consider smart cameras (SC) as illustrating example: For goals such as 3D reconstruction of objects, the most promising configuration of one SC in terms of pan/tilt/zoom parameters depends largely on the configuration of other SCs in the vicinity. Since such mutual influences cannot be pre-defined for dynamic systems, they have to be learned at runtime. Furthermore, they have to be taken into consideration when self-improving the own configuration decisions based on a feedback loop concept, e.g., known from the SASO domain or the Autonomic and Organic Computing initiatives. We define a methodology to detect such influences at runtime, present an approach to consider this information in a reinforcement learning technique, and analyze the behavior in artificial as well as real-world SASO system settings.
Quantifying Teaching Behaviour in Robot Learning from Demonstration
Learning from demonstration allows for rapid deployment of robot manipulators to a great many tasks, by relying on a person showing the robot what to do rather than programming it. While this approach provides many opportunities, measuring, evaluating and improving the person's teaching ability has remained largely unexplored in robot manipulation research. To this end, a model for learning from demonstration is presented here which incorporates the teacher's understanding of, and influence on, the learner. The proposed model is used to clarify the teacher's objectives during learning from demonstration, providing new views on how teaching failures and efficiency can be defined. The benefit of this approach is shown in two experiments (N=30 and N=36, respectively), which highlight the difficulty teachers have in providing effective demonstrations, and show how ~169-180% improvement in teaching efficiency can be achieved through evaluation and feedback shaped by the proposed framework, relative to unguided teaching.
Emergent Escape-based Flocking Behavior using Multi-Agent Reinforcement Learning
Hahn, Carsten, Phan, Thomy, Gabor, Thomas, Belzner, Lenz, Linnhoff-Popien, Claudia
In nature, flocking or swarm behavior is observed in many species as it has beneficial properties like reducing the probability of being caught by a predator. In this paper, we propose SELFish (Swarm Emergent Learning Fish), an approach with multiple autonomous agents which can freely move in a continuous space with the objective to avoid being caught by a present predator. The predator has the property that it might get distracted by multiple possible preys in its vicinity. We show that this property in interaction with self-interested agents which are trained with reinforcement learning to solely survive as long as possible leads to flocking behavior similar to Boids, a common simulation for flocking behavior. Furthermore we present interesting insights in the swarming behavior and in the process of agents being caught in our modeled environment.
The sharp, the flat and the shallow: Can weakly interacting agents learn to escape bad minima?
Kantas, Nikolas, Parpas, Panos, Pavliotis, Grigorios A.
An open problem in machine learning is whether flat minima generalize better and how to compute such minima efficiently. This is a very challenging problem. As a first step towards understanding this question we formalize it as an optimization problem with weakly interacting agents. We review appropriate background material from the theory of stochastic processes and provide insights that are relevant to practitioners. We propose an algorithmic framework for an extended stochastic gradient Langevin dynamics and illustrate its potential. The paper is written as a tutorial, and presents an alternative use of multi-agent learning. Our primary focus is on the design of algorithms for machine learning applications; however the underlying mathematical framework is suitable for the understanding of large scale systems of agent based models that are popular in the social sciences, economics and finance.
Training Neural Networks with PSO - Vortarus Technologies LLC
In this article we are going to discuss training neural networks using particle swarm optimization (PSO). Training a neural network is an optimization problem so the optimization algorithm is of primary importance. When training MLPs we are adjusting weights between neurons using an error function as our optimization objective. PNNs and GRNNs use a smoothing factor, σ to define the network. The objective is to find sigmas that minimize error.
Smoothing Policies and Safe Policy Gradients
Papini, Matteo, Pirotta, Matteo, Restelli, Marcello
Policy gradient algorithms are among the best candidates for the much anticipated application of reinforcement learning to real-world control tasks, such as the ones arising in robotics. However, the trial-and-error nature of these methods introduces safety issues whenever the learning phase itself must be performed on a physical system. In this paper, we address a specific safety formulation, where danger is encoded in the reward signal and the learning agent is constrained to never worsen its performance. By studying actor-only policy gradient from a stochastic optimization perspective, we establish improvement guarantees for a wide class of parametric policies, generalizing existing results on Gaussian policies. This, together with novel upper bounds on the variance of policy gradient estimators, allows to identify those meta-parameter schedules that guarantee monotonic improvement with high probability. The two key meta-parameters are the step size of the parameter updates and the batch size of the gradient estimators. By a joint, adaptive selection of these meta-parameters, we obtain a safe policy gradient algorithm.
PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings
Rhinehart, Nicholas, McAllister, Rowan, Kitani, Kris, Levine, Sergey
For autonomous vehicles (AVs) to behave appropriately on roads populated by human-driven vehicles, they must be able to reason about the uncertain intentions and decisions of other drivers from rich perceptual information. Towards these capabilities, we present a probabilistic forecasting model of future interactions of multiple agents. We perform both standard forecasting and conditional forecasting with respect to the AV's goals. Conditional forecasting reasons about how all agents will likely respond to specific decisions of a controlled agent. We train our model on real and simulated data to forecast vehicle trajectories given past positions and LIDAR. Our evaluation shows that our model is substantially more accurate in multi-agent driving scenarios compared to existing state-of-the-art. Beyond its general ability to perform conditional forecasting queries, we show that our model's predictions of all agents improve when conditioned on knowledge of the AV's intentions, further illustrating its capability to model agent interactions.
Optimal Control of Complex Systems through Variational Inference with a Discrete Event Decision Process
Complex social systems are composed of interconnected individuals whose interactions result in group behaviors. Optimal control of a real-world complex system has many applications, including road traffic management, epidemic prevention, and information dissemination. However, such real-world complex system control is difficult to achieve because of high-dimensional and non-linear system dynamics, and the exploding state and action spaces for the decision maker. Prior methods can be divided into two categories: simulation-based and analytical approaches. Existing simulation approaches have high-variance in Monte Carlo integration, and the analytical approaches suffer from modeling inaccuracy. We adopted simulation modeling in specifying the complex dynamics of a complex system, and developed analytical solutions for searching optimal strategies in a complex network with high-dimensional state-action space. To capture the complex system dynamics, we formulate the complex social network decision making problem as a discrete event decision process. To address the curse of dimensionality and search in high-dimensional state action spaces in complex systems, we reduce control of a complex system to variational inference and parameter learning, introduce Bethe entropy approximation, and develop an expectation propagation algorithm. Our proposed algorithm leads to higher system expected rewards, faster convergence, and lower variance of value function in a real-world transportation scenario than state-of-the-art analytical and sampling approaches.
Learned human-agent decision-making, communication and joint action in a virtual reality environment
Pilarski, Patrick M., Butcher, Andrew, Johanson, Michael, Botvinick, Matthew M., Bolt, Andrew, Parker, Adam S. R.
Humans make decisions and act alongside other humans to pursue both short-term and long-term goals. As a result of ongoing progress in areas such as computing science and automation, humans now also interact with nonhuman agents of varying complexity as part of their day-to-day activities; substantial work is being done to integrate increasingly intelligent machine agents into human work and play. With increases in the cognitive, sensory, and motor capacity of these agents, intelligent machinery for human assistance can now reasonably be considered to engage in joint action with humans--i.e., two or more agents adapting their behaviour and their understanding of each other so as to progress in shared objectives or goals. The mechanisms, conditions, and opportunities for skillful joint action in human-machine partnerships is of great interest to multiple communities. Despite this, human-machine joint action is as yet underexplored, especially in cases where a human and an intelligent machine interact in a persistent way during the course of real-time, daily-life experience (as opposed to specialized, episodic, or time-limited settings such as game play, teaching, or task-focused personal computing applications). In this work, we contribute a virtual reality environment wherein a human and an agent can adapt their predictions, their actions, and their communication so as to pursue a simple foraging task. In a case study with a single participant, we provide an example of human-agent coordination and decision-making involving prediction learning on the part of the human and the machine agent, and control learning on the part of the machine agent wherein audio communication signals are used to cue its human partner in service of acquiring shared reward. These comparisons suggest the utility of studying human-machine coordination in a virtual reality environment, and identify further research that will expand our understanding of persistent human-machine joint action.
Introducing the Hearthstone-AI Competition
Dockhorn, Alexander, Mostaghim, Sanaz
The Hearthstone AI framework and competition motivates the development of artificial intelligence agents that can play collectible card games. A special feature of those games is the high variety of cards, which can be chosen by the players to create their own decks. In contrast to simpler card games, the value of many cards is determined by their possible synergies. The vast amount of possible decks, the randomness of the game, as well as the restricted information during the player's turn offer quite a hard challenge for the development of game-playing agents. This short paper introduces the competition framework and goes into more detail on the problems and challenges that need to be faced during the development process.