This paper investigates methods for estimating the optimal stochastic control policy for a Markov Decision Process with unknown transition dynamics and an unknown reward function. This form of model-free reinforcement learning comprises many real world systems such as playing video games, simulated control tasks, and real robot locomotion. Existing methods for estimating the optimal stochastic control policy rely on high variance estimates of the policy descent. However, these methods are not guaranteed to find the optimal stochastic policy, and the high variance gradient estimates make convergence unstable. In order to resolve these problems, we propose a technique using Markov Chain Monte Carlo to generate samples from the posterior distribution of the parameters conditioned on being optimal. Our method provably converges to the globally optimal stochastic policy, and empirically similar variance compared to the policy gradient.
This short paper is describing a demonstrator that is complementing the paper "Towards Cross-Media Feature Extraction" in these proceedings. The demo is exemplifying the use of textual resources, out of which semantic information can be extracted, for supporting the semantic annotation and indexing of associated video material in the soccer domain. Entities and events extracted from textual data are marked-up with semantic classes derived from an ontology modeling the soccer domain. We show further how extracted Audio-Video features by video analysis can be taken into account for additional annotation of specific soccer event types, and how those different types of annotation can be combined.
The recent win of AlphaGo over Lee Sedol--one of the world's highest ranked Go players--has resurfaced concerns about artificial intelligence. We have heard about A.I. stealing jobs, killer robots, algorithms that help diagnose and cure cancer, competent self-driving cars, perfect poker players, and more. It seems that for every mention of A.I. as humanity's top existential risk, there is a mention of its power to solve humanity's biggest challenges. Demis Hassabis--founder of Google DeepMind, the company behind AlphaGo--views A.I. as "potentially a meta-solution to any problem," and Eric Horvitz--director of research at Microsoft's Redmond, Washington, lab--claims that "A.I. will be incredibly empowering to humanity." By contrast, Bill Gates has called A.I. "a huge challenge" and something to "worry about," and Stephen Hawking has warned about A.I. ending humanity.
The theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. Type 1: Reactive Machines Cortana, Siri, Google Now, A.L.I.C.E., Tumblrbots, AlphaGo, Deep Blue, and IBM's Watson are all examples of reactive machines. Machines that learn, to a point. For example, Deep Blue, who beat the international grand chess master at his own game, could learn and predict possible moves, and knew the rules of the game. But that was it, it could only learn and study and play the game in real time.
Having notched impressive victories over human professionals in Go, Atari Games, and most recently StarCraft 2 -- Google's DeepMind team has now turned its formidable research efforts to soccer. In a paper released last week, the UK AI company demonstrates a novel machine learning method that trains a team of AI agents to play a simulated version of "the beautiful game." Gaming, AI and soccer fans hailed DeepMind's latest innovation on social media, with comments like "You should partner with EA Sports for a FIFA environment!" Machine learning, and particularly deep reinforcement learning, has in recent years achieved remarkable success across a wide range of competitive games. Collaborative-multi-agent games however remained a relatively difficult research domain.