Goto

Collaborating Authors

 Reinforcement Learning


Marqt/ViZDoom

#artificialintelligence

ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research in machine visual learning, and deep reinforcement learning, in particular. ViZDoom is based on ZDoom to provide the game mechanics. ViZDoom API is reinforcement learning friendly (suitable also for learning from demonstration, apprenticeship learning or apprenticeship via inverse reinforcement learning, etc.). To force building bindings for Python3 instead of first version found use -DBUILD_PYTHON3 ON (default OFF).


Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)

arXiv.org Artificial Intelligence

For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation. A problem is misspecified whenever, the representation cannot express any policy with acceptable performance. We introduce IHOMP : an approach for solving misspecified problems. IHOMP iteratively learns a set of context specialized options and combines these options to solve an otherwise misspecified problem. Our main contribution is proving that IHOMP enjoys theoretical convergence guarantees. In addition, we extend IHOMP to exploit Option Interruption (OI) enabling it to decide where the learned options can be reused. Our experiments demonstrate that IHOMP can find near-optimal solutions to otherwise misspecified problems and that OI can further improve the solutions.


13 things you didn't know about Google DeepMind

#artificialintelligence

University of Oxford scientists have teamed up with DeepMind in order to stop artificial intelligent agents learning how to prevent humans from taking full control of them. "If such an agent is operating in real-time under human supervision, now and then it may be necessary for a human operator to press the big red button to prevent the agent from continuing a harmful sequence of actions - harmful either for the agent or for the environment - and lead the agent into a safer situation." "We have proposed a framework to allow a human operator to repeatedly safely interrupt a reinforcement learning agent while making sure the agent will not learn to prevent or induce these interruptions."


Google's 'big red' killswitch could prevent an AI uprising

#artificialintelligence

Google has suggested a "big red button" could be used to prevent artificial intelligence from a "harmful sequence of actions". A research paper from DeepMind and the University of Oxford says there should be a way to "repeatedly safely interrupt" an algorithm. "Safe interruptibility can be useful to take control of a robot that is misbehaving and may lead to irreversible consequences, or to take it out of a delicate situation, or even to temporarily use it to achieve a task it did not learn to perform or would not normally receive rewards for this," the paper, which proposes a framework to let humans stop an algorithms from continuing on a dangerous path, says. The paper's authors โ€“ Laurent Orseau, from DeepMind, and Stuart Armstrong from the The Future of Humanity Institute โ€“ explain that an "interruption policy" could be built into algorithms to safely stop a machine. Reinforcement learning algorithms, the paper continues, often work in complex environments, such as "the real world", and are unlikely to act as they are intended on every occasion.


Google Developing Panic Button To Kill Rogue AI - InformationWeek

#artificialintelligence

With artificial intelligence crossing milestones in its capability to learn rapidly from its environment and beat humans at tasks and games from Jeopardy to the ancient Chinese game Go, Alphabet's Google is taking proactive steps to ensure that the technology it is creating does not one day turn against humans. Google's AI research lab in London, DeepMind, teamed up with Oxford University's Future of Humanity Institute to explore ways to prevent an AI agent from going rogue. In their joint-study, "Safely Interruptible Agents," the DeepMind-Future of Humanity team proposed a framework to allow humans to repeatedly and safely interrupt an AI agent's reinforcement learning. But, more importantly, this can be done while simultaneously blocking an AI agent's ability to learn how to prevent a human operator from turning off its machine-learning capabilities or reinforcement learning. It's not a stretch to think AI agents can learn how to outthink humans.


Google Developing Panic Button To Kill Rogue AI - InformationWeek

#artificialintelligence

With artificial intelligence crossing milestones in its capability to learn rapidly from its environment and beat humans at tasks and games from Jeopardy to the ancient Chinese game Go, Alphabet's Google is taking proactive steps to ensure that the technology it is creating does not one day turn against humans. Google's AI research lab in London, DeepMind, teamed up with Oxford University's Future of Humanity Institute to explore ways to prevent an AI agent from going rogue. In their joint-study, "Safely Interruptible Agents," the DeepMind-Future of Humanity team proposed a framework to allow humans to repeatedly and safely interrupt an AI agent's reinforcement learning. But, more importantly, this can be done while simultaneously blocking an AI agent's ability to learn how to prevent a human operator from turning off its machine-learning capabilities or reinforcement learning. It's not a stretch to think AI agents can learn how to outthink humans. Earlier this year, Google's AI agent AlphaGo beat world champion Lee Sedol in Go, the ancient Chinese game of strategy.


Why is Reinforcement Learning so Curious?

#artificialintelligence

In this simplest of all cases we get some data, say pairs of (images, labels), e.g. of cats and humans and we want to build a cat vs. human discriminator.


[Reinforcement learning] Can we learn action embedding as high level goals? โ€ข /r/MachineLearning

#artificialintelligence

I've recently read Karpathy's blogpost about reinforcement learning and current techniques. Which got me thinking about few ideas. We perform learning differently than Policy gradients, MDP and similar methods. That is, we don't evaluate in each state every possible action and decide what's the most beneficial one. Instead we have layers of actions here each layer describes our strategy more abstractly and more high-level.


On the robustness of learning in games with stochastically perturbed payoff observations

arXiv.org Machine Learning

Motivated by the scarcity of accurate payoff feedback in practical applications of game theory, we examine a class of learning dynamics where players adjust their choices based on past payoff observations that are subject to noise and random disturbances. First, in the single-player case (corresponding to an agent trying to adapt to an arbitrarily changing environment), we show that the stochastic dynamics under study lead to no regret almost surely, irrespective of the noise level in the player's observations. In the multi-player case, we find that dominated strategies become extinct and we show that strict Nash equilibria are stochastically stable and attracting; conversely, if a state is stable or attracting with positive probability, then it is a Nash equilibrium. Finally, we provide an averaging principle for 2-player games, and we show that in zero-sum games with an interior equilibrium, time averages converge to Nash equilibrium for any noise level. Contents 1. Introduction 2 2. The model 5 3. Regret minimization 11 4. Extinction of dominated strategies 14 5.


Why is Reinforcement Learning so Curious?

Huffington Post - Tech news and opinion

So far we assumed that all the data is given to us and that we don't really have much of a choice of what we do. For instance, when Google wants to display an ad for the query'cat' it has thousands of possible ads at its disposition, ranging from kitty litter to catsuits. Which ad gets the most clicks depends very much on the ad, the user, and the context. Hence it must try out different versions to determine which ones are best. Doing this brute force is very expensive (nobody wants to see too many catsuit ads).