Goto

Collaborating Authors

 Planning & Scheduling


Ordinal Monte Carlo Tree Search

arXiv.org Artificial Intelligence

In many problem settings, most notably in game playing, an agent receives a possibly delayed reward for its actions. Often, those rewards are handcrafted and not naturally given. Even simple terminal-only rewards, like winning equals 1 and losing equals -1, can not be seen as an unbiased statement, since these values are chosen arbitrarily, and the behavior of the learner may change with different encodings, such as setting the value of a loss to -0:5, which is often done in practice to encourage learning. It is hard to argue about good rewards and the performance of an agent often depends on the design of the reward signal. In particular, in domains where states by nature only have an ordinal ranking and where meaningful distance information between game state values are not available, a numerical reward signal is necessarily biased. In this paper, we take a look at Monte Carlo Tree Search (MCTS), a popular algorithm to solve MDPs, highlight a reoccurring problem concerning its use of rewards, and show that an ordinal treatment of the rewards overcomes this problem. Using the General Video Game Playing framework we show a dominance of our newly proposed ordinal MCTS algorithm over preference-based MCTS, vanilla MCTS and various other MCTS variants.


Telemedicine logistics: network optimization using artificial intelligence - MedCity News

#artificialintelligence

Logistics is something we traditionally associate with the trucking or the package delivery industry. In fact, a recent article in the Economist estimates that the delivery of 25 packages equals roughly 15 septillion (trillion trillion) possible routes. That's why many companies dealing in complicated webs of variables like this are turning to new technologies like artificial intelligence to help streamline and optimize their operations. What if we took the concepts behind shipping logistics and applied them to the healthcare space? Imagine a healthcare organization with multiple locations, each staffed with providers across multiple specialties--individuals who are not interchangeable--operating under a wide range of room availability and scheduling constraints.


What are the top 6 HR trends and workforce predictions for 2019?

#artificialintelligence

AI and machine learning unmask previously hidden workforce data to make people-centric decisions. Artificial intelligence (AI) and machine learning will finally be woven into workforce management practices, revealing a treasure trove of data organisations have been collecting โ€“ but not using โ€“ for decades. With regular and digestible access to workforce data trends โ€“ like scheduling accuracy, absenteeism, overtime usage, and burnout โ€“ predictive analytics will shine, helping organisations head-off potential issues before they arise. Intelligent automation will also free up managers from admin-heavy tasks โ€“ like managing schedules, approving time-off requests, and shift changes โ€“ while encouraging data-driven decision-making to provide clarity between what is equal versus what is fair. Though, to harness analytical insights to make accurate, actionable decisions for specific employee and business goals, organisations must avoid a "one-size-fits-all" model.


Anagha Kulkarni

#artificialintelligence

I am a PhD student majoring in Computer Science at Arizona State University. I am a member of Yochan research group directed by Prof. Subbarao Kambhampati. Before joining ASU in 2015, I did my Master's at University of Southern California with a major in Computer Science. At USC, I worked on multi-agent path planning problems at IDM Lab while being supervised by Dr. T. K. Satish Kumar and on human-robot interaction related projects at Interaction Lab. If you'd like to contact me, please drop me a mail at anaghak at asu dot edu or find me on LinkedIn.


Complexity Bounds for the Controllability of Temporal Networks with Conditions, Disjunctions, and Uncertainty

arXiv.org Artificial Intelligence

In temporal planning, many different temporal network formalisms are used to model real world situations. Each of these formalisms has different features which affect how easy it is to determine whether the underlying network of temporal constraints is consistent. While many of the simpler models have been well-studied from a computational complexity perspective, the algorithms developed for advanced models which combine features have very loose complexity bounds. In this paper, we provide tight completeness bounds for strong, weak, and dynamic controllability checking of temporal networks that have conditions, disjunctions, and temporal uncertainty. Our work exposes some of the subtle differences between these different structures and, remarkably, establishes a guarantee that all of these problems are computable in PSPACE.


Towards Automated Network Mitigation Analysis (extended)

arXiv.org Artificial Intelligence

Penetration testing is a well-established practical concept for the identification of potentially exploitable security weaknesses and an important component of a security audit. Providing a holistic security assessment for networks consisting of several hundreds hosts is hardly feasible though without some sort of mechanization. Mitigation, prioritizing counter-measures subject to a given budget, currently lacks a solid theoretical understanding and is hence more art than science. In this work, we propose the first approach for conducting comprehensive what-if analyses in order to reason about mitigation in a conceptually well-founded manner. To evaluate and compare mitigation strategies, we use simulated penetration testing, i.e., automated attack-finding, based on a network model to which a subset of a given set of mitigation actions, e.g., changes to the network topology, system updates, configuration changes etc. is applied. Using Stackelberg planning, we determine optimal combinations that minimize the maximal attacker success (similar to a Stackelberg game), and thus provide a well-founded basis for a holistic mitigation strategy. We show that these Stackelberg planning models can largely be derived from network scan, public vulnerability databases and manual inspection with various degrees of automation and detail, and we simulate mitigation analysis on networks of different size and vulnerability.


Ethically Aligned Opportunistic Scheduling for Productive Laziness

arXiv.org Artificial Intelligence

In artificial intelligence (AI) mediated workforce management systems (e.g., crowdsourcing), long-term success depends on workers accomplishing tasks productively and resting well. This dual objective can be summarized by the concept of productive laziness. Existing scheduling approaches mostly focus on efficiency but overlook worker wellbeing through proper rest. In order to enable workforce management systems to follow the IEEE Ethically Aligned Design guidelines to prioritize worker wellbeing, we propose a distributed Computational Productive Laziness (CPL) approach in this paper. It intelligently recommends personalized work-rest schedules based on local data concerning a worker's capabilities and situational factors to incorporate opportunistic resting and achieve superlinear collective productivity without the need for explicit coordination messages. Extensive experiments based on a real-world dataset of over 5,000 workers demonstrate that CPL enables workers to spend 70% of the effort to complete 90% of the tasks on average, providing more ethically aligned scheduling than existing approaches.


Learning Plannable Representations with Causal InfoGAN

Neural Information Processing Systems

In recent years, deep generative models have been shown to 'imagine' convincing high-dimensional observations such as images, audio, and even video, learning directly from raw data. In this work, we ask how to imagine goal-directed visual plans -- a plausible sequence of observations that transition a dynamical system from its current configuration to a desired goal state, which can later be used as a reference trajectory for control. We focus on systems with high-dimensional observations, such as images, and propose an approach that naturally combines representation learning and planning. Our framework learns a generative model of sequential observations, where the generative process is induced by a transition in a low-dimensional planning model, and an additional noise. By maximizing the mutual information between the generated observations and the transition in the planning model, we obtain a low-dimensional representation that best explains the causal nature of the data. We structure the planning model to be compatible with efficient planning algorithms, and we propose several such models based on either discrete or continuous states. Finally, to generate a visual plan, we project the current and goal observations onto their respective states in the planning model, plan a trajectory, and then use the generative model to transform the trajectory to a sequence of observations. We demonstrate our method on imagining plausible visual plans of rope manipulation.


Monte-Carlo Tree Search for Constrained POMDPs

Neural Information Processing Systems

Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems. However, many real-world problems inherently have multiple goals, where multi-objective formulations are more natural. The constrained POMDP (CPOMDP) is such a model that maximizes the reward while constraining the cost, extending the standard POMDP model. To date, solution methods for CPOMDPs assume an explicit model of the environment, and thus are hardly applicable to large-scale real-world problems. In this paper, we present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment. In the experiments, we demonstrate that CC-POMCP converges to the optimal stochastic action selection in CPOMDP and pushes the state-of-the-art by being able to scale to very large problems.


Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies

Neural Information Processing Systems

We introduce a new RL problem where the agent is required to generalize to a previously-unseen environment characterized by a subtask graph which describes a set of subtasks and their dependencies. Unlike existing hierarchical multitask RL approaches that explicitly describe what the agent should do at a high level, our problem only describes properties of subtasks and relationships among them, which requires the agent to perform complex reasoning to find the optimal subtask to execute. To solve this problem, we propose a neural subtask graph solver (NSGS) which encodes the subtask graph using a recursive neural network embedding. To overcome the difficulty of training, we propose a novel non-parametric gradient-based policy, graph reward propagation, to pre-train our NSGS agent and further finetune it through actor-critic method. The experimental results on two 2D visual domains show that our agent can perform complex reasoning to find a near-optimal way of executing the subtask graph and generalize well to the unseen subtask graphs. In addition, we compare our agent with a Monte-Carlo tree search (MCTS) method showing that our method is much more efficient than MCTS, and the performance of NSGS can be further improved by combining it with MCTS.