Goto

Collaborating Authors

 Search


Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement learning (RL) has emerged as a promising tool for combinatorial optimization (CO) problems due to its ability to learn fast, effective, and generalizable solutions. Nonetheless, existing works mostly focus on one-shot deterministic CO, while sequential stochastic CO (SSCO) has rarely been studied despite its broad applications such as adaptive influence maximization (IM) and infectious disease intervention. In this paper, we study the SSCO problem where we first decide the budget (e.g., number of seed nodes in adaptive IM) allocation for all time steps, and then select a set of nodes for each time step. The few existing studies on SSCO simplify the problems by assuming a uniformly distributed budget allocation over the time horizon, yielding suboptimal solutions. We propose a generic hierarchical RL (HRL) framework called wake-sleep option (WS-option), a two-layer option-based framework that simultaneously decides adaptive budget allocation on the higher layer and node selection on the lower layer. WS-option starts with a coherent formulation of the two-layer Markov decision processes (MDPs), capturing the interdependencies between the two layers of decisions. Building on this, WS-option employs several innovative designs to balance the model's training stability and computational efficiency, preventing the vicious cyclic interference issue between the two layers. Empirical results show that WS-option exhibits significantly improved effectiveness and generalizability compared to traditional methods. Moreover, the learned model can be generalized to larger graphs, which significantly reduces the overhead of computational resources.


Neural Genetic Search in Discrete Spaces

arXiv.org Artificial Intelligence

Effective search methods are crucial for improving the performance of deep generative models at test time. In this paper, we introduce a novel test-time search method, Neural Genetic Search (NGS), which incorporates the evolutionary mechanism of genetic algorithms into the generation procedure of deep models. The core idea behind NGS is its crossover, which is defined as parent-conditioned generation using trained generative models. This approach offers a versatile and easy-to-implement search algorithm for deep generative models. We demonstrate the effectiveness and flexibility of NGS through experiments across three distinct domains: routing problems, adversarial prompt generation for language models, and molecular design.


Review for NeurIPS paper: Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Neural Information Processing Systems

Additional Feedback: 148: "ranked by confidence scores in policy distribution": are these nodes with the highest probability under the current policy \pi? 163: I was unclear on what metric M is used. I think it would be helpful to give some more motivation for the proxy graph. Is the proxy graph necessary because the underlying navigation graphs are too large or too densely connected to run a GNN on? Or does the proxy graph improve performance on the metrics? Can "mp" in Table 3 be defined in terms of the K_p or K_L defined in section 4.2? --- update after author response --- Thank you for all your helpful clarifications! They largely address my concerns, and I've updated my score to a 6.


Review for NeurIPS paper: Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Neural Information Processing Systems

This paper addresses the problem of vision-and-language navigation from raw visual input and language instructions in a photorealistic indoor environment (Room-to-Room) by iteratively building a high-level graph representation and then goal-driven planning using Graph Neural Networks. Instead of planning on the full graph, the model predicts actions over the fringe nodes of that graph (i.e., jumps through the graph using shortest path) and it also predicts and plans on a sparser proxy graph representation (these are novel ideas). It is trained using imitation learning. After discussion and authors' rebuttal, the reviewers' scores are (6, 7, 7, 6). While many of the reviewers' concerns are addressed, the main remaining concerns are a missing comparison to graph search methods (specifically: "Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation"), confusion about the word planning in an imitation learning setting, discussions about how loop closure is performed, acknowledging the competitive advantage of knowing which nodes of the graph are frontier nodes, and scarce information about how to reproduce the work.


Review for NeurIPS paper: Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems

Neural Information Processing Systems

What is the lower-bound the paper refer to? - Currently the way that the paper is written does not highlight the novelties of the work. It seems the work combines existing methods for solving min-max with exiting variance reduction modules inside. Please clarify the novelties in the analysis further (the explanation in section 5 does not highlight the difficulties and challenges).


Review for NeurIPS paper: Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems

Neural Information Processing Systems

The reviewers all agree that the improved complexity is new for the class of stochastic nonconvex-strongly-concave minimax problems, and the claim of optimal complexity is clarified in the rebuttal. The weakness pointed out by the reviewers is the seemingly lack of novelty by combining existing techniques of SGDA and SARAH gradient estimators, but the author rebuttal is convincing that the combination in the minimax setting requires innovation. Another weakness is that the SREDA algorithm is rather complex, having multi-level loops and can be hard to tune in practice. Overall I recommend acceptance based on the value of the theoretical improvement. The authors should carefully address the remaining concerns of the reviewers in the revision, especially to clarify the claim of optimal complexity.


Review for NeurIPS paper: Differentiable Top-k with Optimal Transport

Neural Information Processing Systems

Additional Feedback: Some comments: l. 117 The entropic OT is surely not more computationally friendly than a a top-k operator that simply sorts the vector. Same for the beam-search method, the present work seems to be a sequence of ad-hoc definitions rather than a principled objective. In particular it is important to make the optimization objective clear to enable future comparisons. Can the authors clearly distinguish their contributions from the ones of Cuturi et al, 2019? It seems that the implementation of the authors is potentially faster, which should be highlighted.


Review for NeurIPS paper: Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

Neural Information Processing Systems

Weaknesses: v) It is difficult to attribute the source of empirical gains, since the paper is presenting both a problem-specific architecture search space and a particular search method. The comparison to random is missing some potentially-important measures as it has no error bars or plot of the distribution. Though the comparison to evolutionary methods in Fig 2. is a good experiment along these lines, the (missing) random comparison is especially important [a]. The comparison to random is against the *best* model found by random search, instead of error bars or any modeling of the search space. This'd be important for comparisons that separate out the search vs design space as in [a,b].


Review for NeurIPS paper: A General Large Neighborhood Search Framework for Solving Integer Linear Programs

Neural Information Processing Systems

Additional Feedback: I wonder whether the used LNS requires a local search algorithm for solving the subproblem (Line 3). The authors argue that they set \gamma to 1 because it is a finite-horizon task. I completely agree that this is a possible choice; however even for finite-horizon tasks, \gamma can be set to values smaller than 1.0. I wonder how sensitive their approach is to such hyperparameters. The authors sampled 5 trajectories for each problem (instance?) to estimate the policy gradient. I'm not sure whether I understood that point fully.


Review for NeurIPS paper: GCOMB: Learning Budget-constrained Combinatorial Algorithms over Billion-sized Graphs

Neural Information Processing Systems

Three reviewers rated this paper as weak accept, and one as reject. All reviewers felt the paper combined learning-based techniques effectively to achieve impressive performance on combinatorial optimization problems in massive graphs. Reviewers describe the work as a combination of heuristics and modules consisting of existing techniques, but largely view the overall system as being significant, and comment on its impressive performance and an ablation study to justify individual components. The main criticisms were about missing comparisons to baselines. It was observed that the proposed method essentially does well on submodular coverage style problems where the greedy algorithm is often nearly optimal in practice and its main advantage is being much faster.