seeder
564127c03caab942e503ee6f810f54fd-Supplemental.pdf
This paper solves three NP-hard routing problems, traveling salesman problem (TSP), prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP). This section provides detailed descriptions of PCTSP and CVRP (for TSP, see section 3). The PCTSP is similar to TSP, while there are differences in that we do not have to visit all the nodes and that the destination is not the first node but the depot node, i.e., a tour is not a cycle. Let N be the number of nodes. The problem instance of PCTSP is s = {(xi,ฮปi,ยตi)}N+1i=1, where the xi R2 is in 2D euclidean coordinates, ฮปi R is the penalty of unvisited node, and ยตi R is the prize of visited node. The L(ฯ|s) is the tour length, and ฮป(ฯ|s) is the total penalty of the unvisited nodes.
Learning Collaborative Policies to Solve NP-hard Routing Problems
Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions. On the other hand, the reviser modifies each candidate solution generated by the seeder; it partitions the full trajectory into sub-tours and simultaneously revises each sub-tour to minimize its traveling distance. Thus, the reviser is trained to improve the candidate solution's quality, focusing on the reduced solution space (which is beneficial for exploitation). Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP).
Learning Collaborative Policies to Solve NP-hard Routing Problems
Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions.
Learning Collaborative Policies to Solve NP-hard Routing Problems
Kim, Minsu, Park, Jinkyoo, Kim, Joungho
Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions. On the other hand, the reviser modifies each candidate solution generated by the seeder; it partitions the full trajectory into sub-tours and simultaneously revises each sub-tour to minimize its traveling distance. Thus, the reviser is trained to improve the candidate solution's quality, focusing on the reduced solution space (which is beneficial for exploitation). Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP).
The Dot Power Platform Could Transform Farming Technology
The Dot Power Platform is a prime example of an explosion in advanced agricultural technology, which Goldman Sachs predicts will raise crop yields 70 percent by 2050. But Dot isn't just a tractor that can drive without a human for backup. It's the Transformer of ag-bots, capable of performing 100-plus jobs, from hay baler and seeder to rock picker and manure spreader, via an arsenal of tool modules. And though the hulking machine can carry 40,000 pounds, it navigates fields with balletic precision. Farmers map their land using an aerial drone or GPS receiver, upload that data to the Dot controller--a Microsoft Surface Pro--then unleash the beast into the field.
Should we anthropomorphize an AI who wants to kill us all?
Musk, Gates, and Hawking are worrying about AI. But would a sentient AI really want to kill us all? And if it does, should we anthropomorphize the AI to give us humans some measure of advantage? One way to consider these questions is to peer into our human nature or even science fiction for clues. The answers may be a matter of perspective: Are we looking into the window or out?