If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
Oculus co-founder has designed a drone that is capable of seeking out its targets and ramming them head on in order to destroy them. In a video demonstration, Interceptor seeks out its opponent and charges at it 100 miles per hour, ultimately hurdling both of them to the ground. The company claims it is capable of neutralizing threats in any environment, day or night, and according to its creator, the device'almost always survives and returns to base.' Interceptor is the brainchild of Anduril, which was founded by Palmer Luckey who also co-founded Oculus - the Facebook owned company that designs virtual reality technology. 'The best way to kill fast drones piloted by hostile humans is with even faster drones piloted by AI!' said Luckey on Twitter. 'The United States cannot allow the skies of the world to turn into the Wild West, our ability to take out aerial threats in a matter of seconds is part of the solution.'
Collaboration and competition are two of the key pillars on the evolution of human societies and essential to our evolution as species. Billions of people inhabit our planet grouped in millions of communities, each with their own beliefs about politics, economics, religion, social justice, or sports. While those beliefs make each of us unique, they haven't prevented us from coming together to achieve amazing things. Those group efforts are typically guided by the cooperative and competitive dynamics between its members who constitute the foundation of collective intelligence. From this perspective, every area of human knowledge can be traced back to a collaborative and/or competitive dynamic in a specific community.
This is his seminal paper originally published in 1959 where Samuel sets out to build a program that can learn to play the game of checkers. Checkers is an extremely complex game - as a matter of fact the game has roughly 500 billion billion possible positions - that using a brute force only approach to solve it is not satisfactory. Samuel's program was based on Claude Shannon's minimax strategy to find the best move from a given current position. In this paper he describes how a machine could look ahead "by evaluating the resulting board positions much as a human player might do".
Bots removed opponents' tools from the game space, and launched themselves into the air… Two teams of AI agents tasked with playing a game (or million) of hide and seek in a virtual environment developed complex strategies and counterstrategies – and exploited holes in their environment that even its creators didn't even know that it had. The game was part of an experiment by OpenAI designed to test the AI skills that emerge from multi-agent competition and standard reinforcement learning algorithms at scale. OpenAI described the outcome in a striking paper published this week. The organisation, now heavily backed by Microsoft, described the outcome as further proof that "skills, far more complex than the seed game dynamics and environment, can emerge" (from such experiments/training exercises). Some of its findings are neatly captured in the video below.
This paper presents an algorithmic framework for learning robust policies in asymmetric imperfect-information games, where the joint reward could depend on the uncertain opponent type (a private information known only to the opponent itself and its ally). In order to maximize the reward, the protagonist agent has to infer the opponent type through agent modeling. We use multiagent reinforcement learning (MARL) to learn opponent models through self-play, which captures the full strategy interaction and reasoning between agents. However, agent policies learned from self-play can suffer from mutual overfitting. Ensemble training methods can be used to improve the robustness of agent policy against different opponents, but it also significantly increases the computational overhead. In order to achieve a good trade-off between the robustness of the learned policy and the computation complexity, we propose to train a separate opponent policy against the protagonist agent for evaluation purposes. The reward achieved by this opponent is a noisy measure of the robustness of the protagonist agent policy due to the intrinsic stochastic nature of a reinforcement learner. To handle this stochasticity, we apply a stochastic optimization scheme to dynamically update the opponent ensemble to optimize an objective function that strikes a balance between robustness and computation complexity. We empirically show that, under the same limited computational budget, the proposed method results in more robust policy learning than standard ensemble training.
This paper introduces a new negotiating agent model for automated negotiation. We focus on applications without time pressure with multidi-mensional negotiation on both continuous and discrete domains. The agent bidding strategy relies on Monte Carlo Tree Search, which is a trendy method since it has been used with success on games with high branching factor such as Go. It also exploits opponent modeling techniques thanks to Gaussian process regression and Bayesian learning. Evaluation is done by confronting the existing agents that are able to negotiate in such context: Random Walker, Tit-for-tat and Nice Tit-for-Tat. None of those agents succeeds in beating our agent. Also, the modular and adaptive nature of our approach is a huge advantage when it comes to optimize it in specific applicative contexts.
Multi-agent reinforcement learning (MARL) extends (single-agent) reinforcement learning (RL) by introducing additional agents and (potentially) partial observability of the environment. Consequently, algorithms for solving MARL problems incorporate various extensions beyond traditional RL methods, such as a learned communication protocol between cooperative agents that enables exchange of private information or adaptive modeling of opponents in competitive settings. One popular algorithmic construct is a memory mechanism such that an agent's decisions can depend not only upon the current state but also upon the history of observed states and actions. In this paper, we study how a memory mechanism can be useful in environments with different properties, such as observability, internality and presence of a communication channel. Using both prior work and new experiments, we show that a memory mechanism is helpful when learning agents need to model other agents and/or when communication is constrained in some way; however we must to be cautious of agents achieving effective memoryfulness through other means.
In the first part, we explored how Bayesian Statistics might be used to make reinforcement learning less data-hungry. Now we execute this idea in a simple example, using Tensorflow Probability to implement our model. When it comes to games, it is difficult to imagine something simpler than rock, paper, scissors. Despite the simplicity, googling the game reveals a remarkable body of literature. We want to use Bayesian Statistics to play this game and exploit the biases of a human opponent.
Google's DeepMind AI division will likely end up making the next generation of military killbots, but before then, at least they'll provide new challenges for the esports crowd. To make sure it wasn't a fluke, they've unleashed AlphaStar on the European public. According to this official blog post, AlphaStar is limited to Europe for now. StarCraft II players can opt for a chance to have their next 1v1 partner partner swapped out for an unfeeling machine that's less likely to insult your mother. The good news is that AlphaStar isn't going to be learning bad habits and worse language from StarCraft 2's player population.
Fractals2019 started as a new experimental entry in the RoboCup Soccer 2D Simulation League, based on Gliders2d code base, and advanced to a team winning RoboCup-2019 championship. Our approach is centred on combinatorial optimisation methods, within the framework of Guided Self-Organisation (GSO), with the search guided by local constraints. We present examples of several tactical tasks based on the fully released Gliders2d code (version v2), including the search for an optimal assignment of heterogeneous player types, as well as blocking behaviours, offside trap, and attacking formations. We propose a new method, Dynamic Constraint Annealing, for solving dynamic constraint satisfaction problems, and apply it to optimise thermodynamic potential of collective behaviours, under dynamically induced constraints. 1 Introduction The RoboCup Soccer 2D Simulation League provides a rich dynamic environment, facilitated by the RoboCup Soccer Simulator (RCSS), aimed to test advances in decentralised collective behaviours of autonomous agents. The challenges include concurrent adversarial actions, computational nondetermin-ism, noise and latency in asynchronous perception and actuation, and limited processing time [1-9]. Over the years the progress of the League has been supported by several important base code releases, covering both low-level skills and standardised world models of simulated agents [10-13]. The release in 2010 of the base code of HELIOS team, agent2d-3.0.0, later upgraded to agent2d-3.1.1,