Goto

Collaborating Authors

 Planning & Scheduling


Comparing Alternative Route Planning Techniques: A Web-based Demonstration and User Study

arXiv.org Artificial Intelligence

Due to the popularity of smartphones, cheap wireless networks and availability of road network data, navigation applications have become a part of our everyday life. Many modern navigation systems and map-based services do not only provide the fastest route from a source location s to a target location t but also provide a few alternative routes to the users as more options to choose from. Consequently, computing alternative paths from a source s to a target t has received significant research attention in the past few years. However, it is not clear which of the existing approaches generates alternative paths of better quality because the quality of these alternatives is mostly subjective. Motivated by this, in this paper, we present the first user study that compares the quality of the alternative routes generated by four of the most popular existing approaches including the routes provided by Google Maps. We also present the details of a web-based demo system that can be accessed using any internet enabled device and allows users to see the alternative routes generated by the four approaches for any pair of source and target selected by the users. Our user study shows that although the mean rating received by Google Maps is slightly lower than the mean ratings received by the other three approaches, the results are not statistically significant. We also discuss the limitations of this user study and recommend the readers to interpret these results with caution because certain factors beyond our control may have affected the participants' ratings.


Online Bayesian Goal Inference for Boundedly-Rational Planning Agents

arXiv.org Artificial Intelligence

People routinely infer the goals of others by observing their actions over time. Remarkably, we can do so even when those actions lead to failure, enabling us to assist others when we detect that they might not achieve their goals. How might we endow machines with similar capabilities? Here we present an architecture capable of inferring an agent's goals online from both optimal and non-optimal sequences of actions. Our architecture models agents as boundedly-rational planners that interleave search with execution by replanning, thereby accounting for sub-optimal behavior. These models are specified as probabilistic programs, allowing us to represent and perform efficient Bayesian inference over an agent's goals and internal planning processes. To perform such inference, we develop Sequential Inverse Plan Search (SIPS), a sequential Monte Carlo algorithm that exploits the online replanning assumption of these models, limiting computation by incrementally extending inferred plans as new actions are observed. We present experiments showing that this modeling and inference architecture outperforms Bayesian inverse reinforcement learning baselines, accurately inferring goals from both optimal and non-optimal trajectories involving failure and back-tracking, while generalizing across domains with compositional structure and sparse rewards.


Optimal Allocation of Real-Time-Bidding and Direct Campaigns

arXiv.org Artificial Intelligence

In this paper, we consider the problem of optimizing the revenue a web publisher gets through real-time bidding (i.e. from ads sold in real-time auctions) and direct (i.e. from ads sold through contracts agreed in advance). We consider a setting where the publisher is able to bid in the real-time bidding auction for each impression. If it wins the auction, it chooses a direct campaign to deliver and displays the corresponding ad. This paper presents an algorithm to build an optimal strategy for the publisher to deliver its direct campaigns while maximizing its real-time bidding revenue. The optimal strategy gives a formula to determine the publisher bid as well as a way to choose the direct campaign being delivered if the publisher bidder wins the auction, depending on the impression characteristics. The optimal strategy can be estimated on past auctions data. The algorithm scales with the number of campaigns and the size of the dataset. This is a very important feature, as in practice a publisher may have thousands of active direct campaigns at the same time and would like to estimate an optimal strategy on billions of auctions. The algorithm is a key component of a system which is being developed, and which will be deployed on thousands of web publishers worldwide, helping them to serve efficiently billions of ads a day to hundreds of millions of visitors.


From proprioception to long-horizon planning in novel environments: A hierarchical RL model

arXiv.org Artificial Intelligence

For an intelligent agent to flexibly and efficiently operate in complex environments, they must be able to reason at multiple levels of temporal, spatial, and conceptual abstraction. At the lower levels, the agent must interpret their proprioceptive inputs and control their muscles, and at the higher levels, the agent must select goals and plan how they will achieve those goals. It is clear that each of these types of reasoning is amenable to different types of representations, algorithms, and inputs. In this work, we introduce a simple, three-level hierarchical architecture that reflects these distinctions. The low-level controller operates on the continuous proprioceptive inputs, using model-free learning to acquire useful behaviors. These in turn induce a set of mid-level dynamics, which are learned by the mid-level controller and used for model-predictive control, to select a behavior to activate at each timestep. The high-level controller leverages a discrete, graph representation for goal selection and path planning to specify targets for the mid-level controller. We apply our method to a series of navigation tasks in the Mujoco Ant environment, consistently demonstrating significant improvements in sample-efficiency compared to prior model-free, model-based, and hierarchical RL methods. Finally, as an illustrative example of the advantages of our architecture, we apply our method to a complex maze environment that requires efficient exploration and long-horizon planning.


Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

arXiv.org Machine Learning

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.


POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

arXiv.org Artificial Intelligence

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces. In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics. We introduce POLY-HOOT, an algorithm that augments MCTS with a continuous armed bandit strategy named Hierarchical Optimistic Optimization (HOO) (Bubeck et al., 2011). Specifically, we enhance HOO by using an appropriate polynomial, rather than logarithmic, bonus term in the upper confidence bounds. Such a polynomial bonus is motivated by its empirical successes in AlphaGo Zero (Silver et al., 2017b), as well as its significant role in achieving theoretical guarantees of finite space MCTS (Shah et al., 2019). We investigate, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems. Using this result as a building block, we establish non-asymptotic convergence guarantees for POLY-HOOT: the value estimate converges to an arbitrarily small neighborhood of the optimal value function at a polynomial rate. We further provide experimental results that corroborate our theoretical findings.


Learning compositional models of robot skills for task and motion planning

arXiv.org Artificial Intelligence

The objective of this work is to augment the basic abilities of a robot by learning to use new sensorimotor primitives to solve complex long-horizon manipulation problems. This requires flexible generative planning that can combine primitive abilities in novel combinations and thus generalize across a wide variety of problems. In order to plan with primitive actions, we must have models of the preconditions and effects of those actions: under what circumstances will executing this primitive successfully achieve some particular effect in the world? We use, and develop novel improvements on, state-of-the-art methods for active learning and sampling. We use Gaussian process methods for learning the conditions of operator effectiveness from small numbers of expensive training examples. We develop adaptive sampling methods for generating a comprehensive and diverse sequence of continuous parameter values (such as pouring waypoints for a cup) configurations and during planning for solving a new task, so that a complete robot plan can be found as efficiently as possible. We demonstrate our approach in an integrated system, combining traditional robotics primitives with our newly learned models using an efficient robot task and motion planner. We evaluate our approach both in simulation and in the real world through measuring the quality of the selected pours and scoops. Finally, we apply our integrated system to a variety of long-horizon simulated and real-world manipulation problems.


New Fusion Algorithm provides an alternative approach to Robotic Path planning

arXiv.org Artificial Intelligence

For rapid growth in technology and automation, human tasks are being taken over by robots as robots have proven to be better with both speed and precision. One of the major and widespread usages of these robots is in the industrial businesses, where they are employed to carry massive loads in and around work areas. As these working environments might not be completely localized and could be dynamically changing, new approaches must be evaluated to guarantee a crash-free way of performing duties. This paper presents a new and efficient fusion algorithm for solving the path planning problem in a custom 2D environment. This fusion algorithm integrates an improved and optimized version of both, A* algorithm and the Artificial potential field method. Firstly, an initial or preliminary path is planned in the environmental model by adopting the A* algorithm. The heuristic function of this A* algorithm is optimized and improved according to the environmental model. This is followed by selecting and saving the key nodes in the initial path. Lastly, on the basis of these saved key nodes, path smoothing is done by artificial potential field method. Our simulation results carried out using Python viz. libraries indicate that the new fusion algorithm is feasible and superior in smoothness performance and can satisfy as a time-efficient and cheaper alternative to conventional A* strategies of path planning.


Every Action Based Sensor

arXiv.org Artificial Intelligence

In studying robots and planning problems, a basic question is what is the minimal information a robot must obtain to guarantee task completion. Erdmann's theory of action-based sensors is a classical approach to characterizing fundamental information requirements. That approach uses a plan to derive a type of virtual sensor which prescribes actions that make progress toward a goal. We show that the established theory is incomplete: the previous method for obtaining such sensors, using backchained plans, overlooks some sensors. Furthermore, there are plans, that are guaranteed to achieve goals, where the existing methods are unable to provide any action-based sensor. We identify the underlying feature common to all such plans. Then, we show how to produce action-based sensors even for plans where the existing treatment is inadequate, although for these cases they have no single canonical sensor. Consequently, the approach is generalized to produce sets of sensors. Finally, we show also that this is a complete characterization of action-based sensors for planning problems and discuss how an action-based sensor translates into the traditional conception of a sensor.


DeepSoCS: A Neural Scheduler for Heterogeneous System-on-Chip (SoC) Resource Scheduling

arXiv.org Artificial Intelligence

In this paper, we~present a novel scheduling solution for a class of System-on-Chip (SoC) systems where heterogeneous chip resources (DSP, FPGA, GPU, etc.) must be efficiently scheduled for continuously arriving hierarchical jobs with their tasks represented by a directed acyclic graph. Traditionally, heuristic algorithms have been widely used for many resource scheduling domains, and Heterogeneous Earliest Finish Time (HEFT) has been a dominating state-of-the-art technique across a broad range of heterogeneous resource scheduling domains over many years. Despite their long-standing popularity, HEFT-like algorithms are known to be vulnerable to a small amount of noise added to the environment. Our Deep Reinforcement Learning (DRL)-based SoC Scheduler (DeepSoCS), capable of learning the "best" task ordering under dynamic environment changes, overcomes the brittleness of rule-based schedulers such as HEFT with significantly higher performance across different types of jobs. We~describe a DeepSoCS design process using a real-time heterogeneous SoC scheduling emulator, discuss major challenges, and present two novel neural network design features that lead to outperforming HEFT: (i) hierarchical job- and task-graph embedding; and (ii) efficient use of real-time task information in the state space. Furthermore, we~introduce effective techniques to address two fundamental challenges present in our environment: delayed consequences and joint actions. Through an extensive simulation study, we~show that our DeepSoCS exhibits the significantly higher performance of job execution time than that of HEFT with a higher level of robustness under realistic noise conditions. We~conclude with a discussion of the potential improvements for our DeepSoCS neural scheduler.