Go player Lee Sedol (R) during the third game of the Google DeepMind Challenge Match against Google-developed supercomputer AlphaGo. Leading Australian artificial intelligence scientist Professor Toby Walsh is warning that we are "sleepwalking" into an AI future in which billions of machines and computers will be able to think. Professor Walsh, from the University of New South Wales, is calling for a national discussion about whether society needs to adopt clear boundaries and guidelines around how AI is developed and how it's used in our lives. In his book It's Alive: Artificial Intelligence From The Logic Piano to Killer Robots, he has highlighted key questions in a series of predictions that describe how our future could be far better or far worse because of AI. Here's how he thinks society might change by 2050 thanks to artificial intelligence.
When Google DeepMind's AlphaGo shockingly defeated legendary Go player Lee Sedol in 2016, the terms artificial intelligence (AI), machine learning and deep learning were propelled into the technological mainstream. AI is generally defined as the capacity for a computer or machine to exhibit or simulate intelligent behaviour such as Tesla's self-driving car and Apple's digital assistant Siri. It is a thriving field and the focus of much research and investment. Machine learning is the ability of an AI system to extract information from raw data and learn to make predictions from new data. Deep learning combines artificial intelligence with machine learning.
The game of Go played between a DeepMind computer program and a human champion created an existential crisis of sorts for Marcus du Sautoy, a mathematician and professor at Oxford University. "I've always compared doing mathematics to playing the game of Go," he says, and Go is not supposed to be a game that a computer can easily play because it requires intuition and creativity. So when du Sautoy saw DeepMind's AlphaGo beat Lee Sedol, he thought that there had been a sea change in artificial intelligence that would impact other creative realms. He set out to investigate the role that AI can play in helping us understand creativity, and ended up writing The Creativity Code: Art and Innovation in the Age of AI (Harvard University Press). The Verge spoke to du Sautoy about different types of creativity, AI helping humans become more creative (instead of replacing them), and the creative fields where artificial intelligence struggles most.
Since AlphaGo and AlphaGo Zero have achieved breakground successes in the game of Go, the programs have been generalized to solve other tasks. Subsequently, AlphaZero was developed to play Go, Chess and Shogi. In the literature, the algorithms are explained well. However, AlphaZero contains many parameters, and for neither AlphaGo, AlphaGo Zero nor AlphaZero, there is sufficient discussion about how to set parameter values in these algorithms. Therefore, in this paper, we choose 12 parameters in AlphaZero and evaluate how these parameters contribute to training. We focus on three objectives~(training loss, time cost and playing strength). For each parameter, we train 3 models using 3 different values~(minimum value, default value, maximum value). We use the game of play 6$\times$6 Othello, on the AlphaZeroGeneral open source re-implementation of AlphaZero. Overall, experimental results show that different values can lead to different training results, proving the importance of such a parameter sweep. We categorize these 12 parameters into time-sensitive parameters and time-friendly parameters. Moreover, through multi-objective analysis, this paper provides an insightful basis for further hyper-parameter optimization.
If we could shrink the entire history of our planet to one year, humans would have shown up roughly at 11pm on 31 Dec. In the grand scheme of things, we are insignificant. However, if we expand our thinking to the entire observable universe, our evolutionary success is a stroke of near-impossible luck that comprises all the biological conditions and chances required for us to become the dominant species on this planet. Of the 300 billion solar systems in the Milky Way, Earth is the only planet on which we know life exists. Out of the 8.7 billion known species on earth, we became the first general intelligence.
IMAGINE having to solve a jigsaw puzzle with 1 million pieces, without knowing what the final picture is supposed to look like. It is a challenge that computer designers and logistics planners grapple with every day. Now a version of DeepMind's game-playing artificial intelligence can come up with a more efficient solution. The method might have applications in networking problems including routing traffic through cities, couriering deliveries across a country and designing more efficient computer chips.
We argue that actorcritic PGQL (O'Donoghue et al., 2017) allows for an off-policy algorithms are currently limited by their V function, but requires it to be combined with on-policy need for an on-policy critic, which severely constraints advantage values. Notable examples of algorithms without how the critic is learned. We propose an on-policy critic are AlphaGo Zero (Silver et al., 2017), Bootstrapped Dual Policy Iteration (BDPI), that replaces the critic with a slow-moving target policy a novel model-free actor-critic reinforcementlearning learned with tree search, and the Actor-Mimic (Parisotto algorithm for continuous states and et al., 2016), that minimizes the cross-entropy between discrete actions, with off-policy critics. Offpolicy an actor and the Softmax policies of critics (see Section critics are compatible with experience replay, 4.2). The need of most actor-critic algorithms for an onpolicy ensuring high sample-efficiency, without critic makes them incompatible with state-of-the-art the need for off-policy corrections. The actor, value-based algorithms of the Q-Learning family (Arjona-by slowly imitating the average greedy policy Medina et al., 2018; Hessel et al., 2017), that are all highly of the critics, leads to high-quality and statespecific sample-efficient but off-policy. In a discrete-actions setting, exploration, which we show approximates where off-policy value-based methods can be used, Thompson sampling. Because the actor this raises two questions: and critics are fully decoupled, BDPI is remarkably stable and, contrary to other state-of-theart 1. Can we use off-policy value-based algorithms in an algorithms, unusually forgiving for poorlyconfigured actor-critic setting?
A startup called CogitAI has developed a platform that lets companies use reinforcement learning, the technique that gave AlphaGo mastery of the board game Go. Gaining experience: AlphaGo, an AI program developed by DeepMind, taught itself to play Go by practicing. It's practically impossible for a programmer to manually code in the best strategies for winning. Instead, reinforcement learning let the program figure out how to defeat the world's best human players on its own. Drug delivery: Reinforcement learning is still an experimental technology, but it is gaining a foothold in industry.
Recently we've been seeing computers playing games against humans, either as bots in multiplayer games or as opponents in one-on-one games like Dota2, PUB-G, Mario. Deepmind(a research company) made history when the news that their AlphaGo program defeated the South Korean Go world champion in 2016. If you're an intense gamer, probably you must have listened about Dota 2 OpenAI Five match, where machines played against humans and defeated world top Dota2 players in few matches (If you are interested about this, here is the complete analysis of the algorithm and the game played by the machine). So here's the central question, Why do we need reinforcement learning? Is it only used for games?