Goto

Collaborating Authors

 seeder


Learning Collaborative Policies to Solve NP-hard Routing Problems

Neural Information Processing Systems

Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions. On the other hand, the reviser modifies each candidate solution generated by the seeder; it partitions the full trajectory into sub-tours and simultaneously revises each sub-tour to minimize its traveling distance. Thus, the reviser is trained to improve the candidate solution's quality, focusing on the reduced solution space (which is beneficial for exploitation). Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP).




A Details of Experiments

Neural Information Processing Systems

R is the prize of visited node. Most of MDP is similar with TSP including training scheme. The MDP formulation is mostly same as TSP . This section provides implementation details of the seeder for the experiments. The details of setting T in the inference phase (i.e. in experiments) is described in Appendix A.5. A.3 Detailed Implementation of Reviser This section describes the detailed implementation of the reviser for each target problem.



Learning Collaborative Policies to Solve NP-hard Routing Problems

Neural Information Processing Systems

Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions.


Learning Collaborative Policies to Solve NP-hard Routing Problems

Kim, Minsu, Park, Jinkyoo, Kim, Joungho

arXiv.org Machine Learning

Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions. On the other hand, the reviser modifies each candidate solution generated by the seeder; it partitions the full trajectory into sub-tours and simultaneously revises each sub-tour to minimize its traveling distance. Thus, the reviser is trained to improve the candidate solution's quality, focusing on the reduced solution space (which is beneficial for exploitation). Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP).


The Dot Power Platform Could Transform Farming Technology

WIRED

The Dot Power Platform is a prime example of an explosion in advanced agricultural technology, which Goldman Sachs predicts will raise crop yields 70 percent by 2050. But Dot isn't just a tractor that can drive without a human for backup. It's the Transformer of ag-bots, capable of performing 100-plus jobs, from hay baler and seeder to rock picker and manure spreader, via an arsenal of tool modules. And though the hulking machine can carry 40,000 pounds, it navigates fields with balletic precision. Farmers map their land using an aerial drone or GPS receiver, upload that data to the Dot controller--a Microsoft Surface Pro--then unleash the beast into the field.


Should we anthropomorphize an AI who wants to kill us all?

#artificialintelligence

Musk, Gates, and Hawking are worrying about AI. But would a sentient AI really want to kill us all? And if it does, should we anthropomorphize the AI to give us humans some measure of advantage? One way to consider these questions is to peer into our human nature or even science fiction for clues. The answers may be a matter of perspective: Are we looking into the window or out?