Goto

Collaborating Authors

 simulator


A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems Yi Ma

Neural Information Processing Systems

To address this problem, existing methods partition the overall DPDP into fixed-size sub-problems by caching online generated orders and solve each sub-problem, or on this basis to utilize the predicted future orders to optimize each sub-problem further. However, the solution quality and efficiency of these methods are unsatisfactory, especially when the problem scale is very large.




e197fe307eb3467035f892dc100d570a-Supplemental-Conference.pdf

Neural Information Processing Systems

The process for calculating these metrics is described in Appendix C. Moreover, to ensure the comparability between prediction performance metrics and driving performance metrics in the radar plot, we normalize all metrics to the scale of [0, 1]. In the subsequent section, we provide an overview of the DESPOT planner. These two values can only be inferred from history. The safety is represented by the normalized collision rate.







root

Erik Miehling

Neural Information Processing Systems

The high-level architecture of our simulator is illustrated in Figure 1 of Section 4. Additional details (with references to objects in the source code) are provided below. Simulations were run in Python 3.8 on an Intel(R) Xeon(R) CPU E5-2667 One direction is to extend the feature description of the ads (beyond topic) to include features that reflect ad quality and location. Baseline parameters are: µ = 0 . The resulting cohort errors are consistent with Figure 1 of Section 5.1. For the fully informative prior, the agent is completely certain of users' cohorts for Lastly, for the uninformative prior, revelation of a user's cookie does not inform the The agent's ability to distinguish users based on their responses depends on the similarities of affinities across users in different cohorts.