action validity prediction network
Supplementary Material for " Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning " 1 1 23 14Hyunsoo Chung Jungtaek 23 Kim Boris
In this material, we first describe the importance of action validity prediction networks. Then, we introduce the details of the benchmarks, provide the model architecture, and present the additional experimental results, which are missing in the main article. We present the results of wall-clock time for computing the ground-truth action validity in Figure s.1. It shows that computing the action validity for a combination of 100 bricks needs more than 20 seconds. Moreover, we summarize the comparisons between possible action validation approaches as shown in Table s.1.0
Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning
Chung, Hyunsoo, Kim, Jungtaek, Knyazev, Boris, Lee, Jinhwi, Taylor, Graham W., Park, Jaesik, Cho, Minsu
Discovering a solution in a combinatorial space is prevalent in many real-world problems but it is also challenging due to diverse complex constraints and the vast number of possible combinations. To address such a problem, we introduce a novel formulation, combinatorial construction, which requires a building agent to assemble unit primitives (i.e., LEGO bricks) sequentially -- every connection between two bricks must follow a fixed rule, while no bricks mutually overlap. To construct a target object, we provide incomplete knowledge about the desired target (i.e., 2D images) instead of exact and explicit volumetric information to the agent. This problem requires a comprehensive understanding of partial information and long-term planning to append a brick sequentially, which leads us to employ reinforcement learning. The approach has to consider a variable-sized action space where a large number of invalid actions, which would cause overlap between bricks, exist. To resolve these issues, our model, dubbed Brick-by-Brick, adopts an action validity prediction network that efficiently filters invalid actions for an actor-critic network. We demonstrate that the proposed method successfully learns to construct an unseen object conditioned on a single image or multiple views of a target object.