Planning & Scheduling

Revolutionary Warfare The AI of Total War (Part 3)


As the core systems of Total War have been established and redefined in the franchise - a point I have discussed in the first two parts of this series - there is always a need to strive for better. RTS games continue to be one of the most demanding domains for AI to operate within and as such we seek new inspiration from outside of game AI practices. With this in mind, I will be taking a look at 2013's Total War: Rome II - one of the most important games in the franchise when it comes to the design and development of AI practices. So let's take a look at what happened behind the scenes and what makes Rome II such a critical and vital step in Total Wars future progression. In part 2 of this series we concluded with an overview of the dramatic changes to the underlying AI systems in Total War with the release of Empire, followed by Napoleon in 2009 and 2010 respectively.

RWD & AI in the Journals: June top 10


MHRA/CPRD – publish on the data use for PV, including updated capabilities and data reach (CPRD Aurum based on EMIS; fully linked data for 15.9m unique patients) Kaiser – on STOP CRC trial, a 100% EHR driven trial (recruitment, data collection), and challenges in recruitment "reach" achieved via levering the EHR

Unlocking The Value Of Artificial Intelligence For Retailers - Retail TouchPoints


Competition for good workers is tight, and employees' expectations of their jobs have never been higher. They want an inspirational workplace where they feel motivated to be loyal, productive and engaged. Among many things, that means keeping up with technology. Giving retail teams access to leading-edge tech that uses AI and machine learning will provide them -- and you -- insights not previously available, increasing productivity and helping morale. For example, modern workforce management can empower employees with preferred scheduling options and flexible clocking.

PACMAN: A Planner-Actor-Critic Architecture for Human-Centered Planning and Learning Artificial Intelligence

Conventional reinforcement learning (RL) allows an agent to learn policies via environmental rewards only, with a long and slow learning curve at the beginning stage. On the contrary, human learning is usually much faster because prior and general knowledge and multiple information resources are utilized. In this paper, we propose a \textbf{P}lanner-\textbf{A}ctor-\textbf{C}ritic architecture for hu\textbf{MAN}-centered planning and learning (\textbf{PACMAN}), where an agent uses its prior, high-level, deterministic symbolic knowledge to plan for goal-directed actions, while integrates Actor-Critic algorithm of RL to fine-tune its behaviors towards both environmental rewards and human feedback. This is the first unified framework where knowledge-based planning, RL, and human teaching jointly contribute to the policy learning of an agent. Our experiments demonstrate that PACMAN leads to a significant jump start at the early stage of learning, converges rapidly and with small variance, and is robust to inconsistent, infrequent and misleading feedback.

Ordinal Bucketing for Game Trees using Dynamic Quantile Approximation Artificial Intelligence

In this paper, we present a simple and cheap ordinal bucketing algorithm that approximately generates $q$-quantiles from an incremental data stream. The bucketing is done dynamically in the sense that the amount of buckets $q$ increases with the number of seen samples. We show how this can be used in Ordinal Monte Carlo Tree Search (OMCTS) to yield better bounds on time and space complexity, especially in the presence of noisy rewards. Besides complexity analysis and quality tests of quantiles, we evaluate our method using OMCTS in the General Video Game Framework (GVGAI). Our results demonstrate its dominance over vanilla Monte Carlo Tree Search in the presence of noise, where OMCTS without bucketing has a very bad time and space complexity.

Multiple Policy Value Monte Carlo Tree Search Artificial Intelligence

Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-play phase of AlphaZero (AZ) training, a balance needs to be reached between accurate state estimation and more MCTS simulations, both of which are critical for a strong game playing agent. Typically, larger DNNs are better at generalization and accurate evaluation, while smaller DNNs are less costly, and therefore can lead to more MCTS simulations and bigger search trees with the same budget. This paper introduces a new method called the multiple policy value MCTS (MPV-MCTS), which combines multiple policy value neural networks (PV-NNs) of various sizes to retain advantages of each network, where two PV-NNs f_S and f_L are used in this paper. We show through experiments on the game NoGo that a combined f_S and f_L MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS. Additionally, MPV-MCTS also outperforms PV-MCTS for AZ training.

Guarantees for Sound Abstractions for Generalized Planning (Extended Paper) Artificial Intelligence

Generalized planning is about finding plans that solve collections of planning instances, often infinite collections, rather than single instances. Recently it has been shown how to reduce the planning problem for generalized planning to the planning problem for a qualitative numerical problem; the latter being a reformulation that simultaneously captures all the instances in the collection. An important thread of research thus consists in finding such reformulations, or abstractions, automatically. A recent proposal learns the abstractions inductively from a finite and small sample of transitions from instances in the collection. However, as in all inductive processes, the learned abstraction is not guaranteed to be correct for the whole collection. In this work we address this limitation by performing an analysis of the abstraction with respect to the collection, and show how to obtain formal guarantees for generalization. These guarantees, in the form of first-order formulas, may be used to 1) define subcollections of instances on which the abstraction is guaranteed to be sound, 2) obtain necessary conditions for generalization under certain assumptions, and 3) do automated synthesis of complex invariants for planning problems. Our framework is general, it can be extended or combined with other approaches, and it has applications that go beyond generalized planning.

Balancing Goal Obfuscation and Goal Legibility in Settings with Cooperative and Adversarial Observers Artificial Intelligence

In order to be useful in the real world, AI agents need to plan and act in the presence of others, who may include adversarial and cooperative entities. In this paper, we consider the problem where an autonomous agent needs to act in a manner that clarifies its objectives to cooperative entities while preventing adversarial entities from inferring those objectives. We show that this problem is solvable when cooperative entities and adversarial entities use different types of sensors and/or prior knowledge. We develop two new solution approaches for computing such plans. One approach provides an optimal solution to the problem by using an IP solver to provide maximum obfuscation for adversarial entities while providing maximum legibility for cooperative entities in the environment, whereas the other approach provides a satisficing solution using heuristic-guided forward search to achieve preset levels of obfuscation and legibility for adversarial and cooperative entities respectively. We show the feasibility and utility of our algorithms through extensive empirical evaluation on problems derived from planning benchmarks.

Q&A: Travel startup paves way for industry consolidation (Includes interview)


In May 2019 Google announced the consolidation of all its travel features. Google Maps, Trips, Hotels and Flights will combine to make one Google Travel, easing the process for vacation planning. Travel startup VacationRenter, which launched last year, pioneered this model for vacation rentals, based on an artificial intelligence driven platform. According to VacationRenter's newly appointed COO, ex-Googler Marco del Rosario, both Google Travel and VacationRenter are early adopters of a pivotal strategy for today's travel technology: consolidation. Digital Journal: How has the world of travel changed in recent years?

Minimizing the Negative Side Effects of Planning with Reduced Models Artificial Intelligence

Reduced models of large Markov decision processes accelerate planning by considering a subset of outcomes for each state-action pair. This reduction in reachable states leads to replanning when the agent encounters states without a precomputed action during plan execution. However, not all states are suitable for replanning. In the worst case, the agent may not be able to reach the goal from the newly encountered state. Agents should be better prepared to handle such risky situations and avoid replanning in risky states. Hence, we consider replanning in states that are unsafe for deliberation as a negative side effect of planning with reduced models. While the negative side effects can be minimized by always using the full model, this defeats the purpose of using reduced models. The challenge is to plan with reduced models, but somehow account for the possibility of encountering risky situations. An agent should thus only replan in states that the user has approved as safe for replanning. To that end, we propose planning using a portfolio of reduced models, a planning paradigm that minimizes the negative side effects of planning using reduced models by alternating between different outcome selection approaches. We empirically demonstrate the effectiveness of our approach on three domains: an electric vehicle charging domain using real-world data from a university campus and two benchmark planning problems.