Agents
Can AI simulations predict the future?
The recent U.S. backflip on Syria has certainly not helped the nation's residents. Before the Syrian Civil War in 2017, the estimated population was 22 million; today it is roughly five million fewer, with another six million "internally displaced." With Turkey launching an invasion, we can expect more Syrian citizens to become refugees. Beyond the occasional news feature inside of refugee camps, you hear very little about where Syrians end up, save when far right leaders demand it not be in their backyard. How can you tell if they will successfully integrate into the foreign populations they must seek aid from?
SEIHAI: A Sample-efficient Hierarchical AI for the MineRL Competition
Mao, Hangyu, Wang, Chao, Hao, Xiaotian, Mao, Yihuan, Lu, Yiming, Wu, Chengjie, Hao, Jianye, Li, Dong, Tang, Pingzhong
The MineRL competition is designed for the development of reinforcement learning and imitation learning algorithms that can efficiently leverage human demonstrations to drastically reduce the number of environment interactions needed to solve the complex ObtainDiamond task with sparse rewards. To address the challenge, in this paper, we present SEIHAI, a Sample-efficient Hierarchical AI, that fully takes advantage of the human demonstrations and the task structure. Specifically, we split the task into several sequentially dependent subtasks, and train a suitable agent for each subtask using reinforcement learning and imitation learning. We further design a scheduler to select different agents for different subtasks automatically. SEIHAI takes the first place in the preliminary and final of the NeurIPS-2020 MineRL competition.
The Partially Observable History Process
Morrill, Dustin, Greenwald, Amy R., Bowling, Michael
We introduce the partially observable history process (POHP) formalism for reinforcement learning. POHP centers around the actions and observations of a single agent and abstracts away the presence of other players without reducing them to stochastic processes. Our formalism provides a streamlined interface for designing algorithms that defy categorization as exclusively single or multi-agent, and for developing theory that applies across these domains. We show how the POHP formalism unifies traditional models including the Markov decision process, the Markov game, the extensive-form game, and their partially observable extensions, without introducing burdensome technical machinery or violating the philosophical underpinnings of reinforcement learning. We illustrate the utility of our formalism by concisely exploring observable sequential rationality, re-deriving the extensive-form regret minimization (EFR) algorithm, and examining EFR's theoretical properties in greater generality.
Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics
Schubert, Ingmar, Driess, Danny, Oguz, Ozgur S., Toussaint, Marc
Applications of Reinforcement Learning (RL) in robotics are often limited by high data demand. On the other hand, approximate models are readily available in many robotics scenarios, making model-based approaches like planning a data-efficient alternative. Still, the performance of these methods suffers if the model is imprecise or wrong. In this sense, the respective strengths and weaknesses of RL and model-based planners are. In the present work, we investigate how both approaches can be integrated into one framework that combines their strengths. We introduce Learning to Execute (L2E), which leverages information contained in approximate plans to learn universal policies that are conditioned on plans. In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning.
AI in Games: Techniques, Challenges and Opportunities
Yin, Qiyue, Yang, Jun, Ni, Wancheng, Liang, Bin, Huang, Kaiqi
With breakthrough of AlphaGo, AI in human-computer game has become a very hot topic attracting researchers all around the world, which usually serves as an effective standard for testing artificial intelligence. Various game AI systems (AIs) have been developed such as Libratus, OpenAI Five and AlphaStar, beating professional human players. In this paper, we survey recent successful game AIs, covering board game AIs, card game AIs, first-person shooting game AIs and real time strategy game AIs. Through this survey, we 1) compare the main difficulties among different kinds of games for the intelligent decision making field ; 2) illustrate the mainstream frameworks and techniques for developing professional level AIs; 3) raise the challenges or drawbacks in the current AIs for intelligent decision making; and 4) try to propose future trends in the games and intelligent decision making techniques. Finally, we hope this brief review can provide an introduction for beginners, inspire insights for researchers in the filed of AI in games.
A Survey on AI Assurance
Batarseh, Feras A., Freeman, Laura
Artificial Intelligence (AI) algorithms are increasingly providing decision making and operational support across multiple domains. AI includes a wide library of algorithms for different problems. One important notion for the adoption of AI algorithms into operational decision process is the concept of assurance. The literature on assurance, unfortunately, conceals its outcomes within a tangled landscape of conflicting approaches, driven by contradicting motivations, assumptions, and intuitions. Accordingly, albeit a rising and novel area, this manuscript provides a systematic review of research works that are relevant to AI assurance, between years 1985 - 2021, and aims to provide a structured alternative to the landscape. A new AI assurance definition is adopted and presented and assurance methods are contrasted and tabulated. Additionally, a ten-metric scoring system is developed and introduced to evaluate and compare existing methods. Lastly, in this manuscript, we provide foundational insights, discussions, future directions, a roadmap, and applicable recommendations for the development and deployment of AI assurance.
Stefano Somenzi, Athics: On no-code AI and deploying conversational bots
No-code AI solutions are helping more businesses to get started on their AI journeys than ever. AI News caught up with Stefano Somenzi, CTO at Athics, to get his thoughts on no-code AI and the development of virtual agents. AI News: Do you think "no-code" will help more businesses to begin their AI journeys? Stefano Somenzi: The real advantage of "no code" is not just the reduced effort required for businesses to get things done, it is also centered around changing the role of the user who will build the AI solution. "No code" means that the AI solution is built not by a data scientist but by the process owner.
Cooperative multi-agent reinforcement learning for high-dimensional nonequilibrium control
Chennakesavalu, Shriram, Rotskoff, Grant M.
Experimental advances enabling high-resolution external control create new opportunities to produce materials with exotic properties. In this work, we investigate how a multi-agent reinforcement learning approach can be used to design external control protocols for self-assembly. We find that a fully decentralized approach performs remarkably well even with a "coarse" level of external control. More importantly, we see that a partially decentralized approach, where we include information about the local environment allows us to better control our system towards some target distribution. We explain this by analyzing our approach as a partially-observed Markov decision process. With a partially decentralized approach, the agent is able to act more presciently, both by preventing the formation of undesirable structures and by better stabilizing target structures as compared to a fully decentralized approach.
Competing Models
Olea, Jose Luis Montiel, Ortoleva, Pietro, Pai, Mallesh M, Prat, Andrea
Different agents need to make a prediction. They observe identical data, but have different models: they predict using different explanatory variables. We study which agent believes they have the best predictive ability -- as measured by the smallest subjective posterior mean squared prediction error -- and show how it depends on the sample size. With small samples, we present results suggesting it is an agent using a low-dimensional model. With large samples, it is generally an agent with a high-dimensional model, possibly including irrelevant variables, but never excluding relevant ones. We apply our results to characterize the winning model in an auction of productive assets, to argue that entrepreneurs and investors with simple models will be over-represented in new sectors, and to understand the proliferation of "factors" that explain the cross-sectional variation of expected stock returns in the asset-pricing literature.
Winning Solution of the AIcrowd SBB Flatland Challenge 2019-2020
This report describes the main ideas of the solution which won the AIcrowd SBB Flatland Challenge 2019-2020, with a score of 99% (meaning that, on average, 99% of the agents were routed to their destinations within the allotted time steps). The details of the task can be found on the competition's website. The solution consists of 2 major components: 1) A component which (re-)generates paths over a time-expanded graph for each agent 2) A component which updates the agent paths after a malfunction occurs, in order to try to preserve the same agent ordering of entering each cell as before the malfunction. The goal of this component is twofold: a) to (try to) avoid deadlocks b) to bring the system back to a consistent state (where each agent has a feasible path over the time-expanded graph) I am discussing both of these components, as well as a series of potentially promising, but unexplored ideas, below. The invariant for this component is that every agent always has an assigned path (where it will be located at each time step over the whole time horizon), and this component only tries to improve the overall path assignment). Initially, all the agents have a default path assigned which doesn't enter the environment at all (they always just stay at their initial location, outside the environment).