Goto

Collaborating Authors

 Reinforcement Learning


Using Stories to Teach Human Values to Artificial Agents

AAAI Conferences

Value alignment is a property of an intelligent agent indicating that it can only pursue goals that are beneficial to humans. Successful value alignment should ensure that an artificial general intelligence cannot intentionally or unintentionally perform behaviors that adversely affect humans. This is problematic in practice since it is difficult to exhaustively enumerated by human programmers. In order for successful value alignment, we argue that values should be learned. In this paper, we hypothesize that an artificial intelligence that can read and understand stories can learn the values tacitly held by the culture from which the stories originate.We describe preliminary work on using stories to generate a value-aligned reward signal for reinforcement learning agents that prevents psychotic-appearing behavior.


Towards Learning From Stories: An Approach to Interactive Machine Learning

AAAI Conferences

In this work, we introduce a technique that uses stories totrain virtual agents to exhibit believable behavior. This technique uses a compact representation of a story to define the space of acceptable behaviors and then uses this space to assign rewards to certain world states. We show the effectiveness of our technique with a case study in a modified gridworld environment called Pharmacy World. The results show that a reinforcement learning agent using Q-learning was able to learn a policy that results in believable behavior.


Scaling of Cloud Applications Using Machine Learning - VMware Technical Journal

#artificialintelligence

Today's Internet applications are required to be highly scalable and available in the face of rapidly changing, unpredictable workloads. Multi-tier architecture is commonly used to build Internet applications, with different tiers providing load balancing, application logic, and persistence. The advent of cloud computing has given rise to rapid horizontal scaling of applications hosted in virtual machines (VMs) in each of the tiers. Currently, this scaling is done by monitoring system-level metrics (e.g., CPU utilization) and determining whether to scale out or in based on a threshold. These threshold-based algorithms, however, do not capture the complex interaction among multiple tiers, and determining the right set of thresholds for multiple resources to achieve a particular service level objective (SLO) is difficult. In this paper, we present vScale, a horizontal scaling system that can automatically scale the number of VMs in a tier to meet end-to-end application SLOs.


A statistical learning strategy for closed-loop control of fluid flows

arXiv.org Machine Learning

This work discusses a closed-loop control strategy for complex systems utilizing scarce and streaming data. A discrete embedding space is first built using hash functions applied to the sensor measurements from which a Markov process model is derived, approximating the complex system's dynamics. A control strategy is then learned using reinforcement learning once rewards relevant with respect to the control objective are identified. This method is designed for experimental configurations, requiring no computations nor prior knowledge of the system, and enjoys intrinsic robustness. It is illustrated on two systems: the control of the transitions of a Lorenz 63 dynamical system, and the control of the drag of a cylinder flow. The method is shown to perform well.


Reinforcement learning programming implementations โ€ข /r/MachineLearning

@machinelearnbot

It's a cool opener on the concepts, but leaves the actual implementations very hazy. For instance, I would love to understand how to create my own environment (or task, for that matter). Instead, this tutorial just throws ready-made stuff at you which, I reckon, isn't very helpful in actually understanding. If there exists good explanations involving programming I would be very keen in looking into them.


Question about experience replay in deep q learning โ€ข /r/MachineLearning

@machinelearnbot

I am not sure did I understand it correctly. In each state, we update the score of chosen action to be the best Q-Value of the next state and keep the score of other action to be unchanged. Moreover, we put state, updated scores of all actions into memory. We sample N pairs in the memory (needed to be in the same game??) and train them altogether. So we only calculate the new score of the transition that we just take and reuse the calculated scores of previous transition stored in memory?


What are the best books about machine learning?

#artificialintelligence

There are also many good books that focus on one particular topic. For example, Sutton and Barto's Reinforcement Learning is a classic. And Yoshua Bengio's Deep Learning book (available online) is almost becoming a classic before it is published. But, you need a few of those books in order to build a somewhat comprehensive and balanced understanding of the field.


Defining Reward for Deep Reinforcement Learning? โ€ข /r/MachineLearning

@machinelearnbot

I am designing a neural network in Lasagne, a Theano based Deep Learning Library. I am trying to program a simple, Reinforcement Learning network, but am running into a road block in defining the loss function. Basically, the input can be thought of as a location of the AI. The AI needs to get closer to a fixed destination point. The distance can be calculated by the input alone.


Before AlphaGo there was TD-Gammon -- Jim Fleming

#artificialintelligence

Check out the Github repo for an implementation of TD-Gammon with TensorFlow. A few weeks ago AlphaGo won a historic tournament playing the game of Go against Lee Sedol, one of the top Go players in the world. Many people have compared AlphaGo to DeepBlue, which won a series of famous chess matches against Gary Kasparov, but a different comparison may be made for the game of backgammon. Before DeepMind tackled playing Atari games or built AlphaGo there was TD-Gammon, the first algorithm to reach an expert level of play in backgammon. Gerald Tesauro published his paper in 1992 describing TD-Gammon as a neural network trained with reinforcement learning.