Goto

Collaborating Authors

 Search


time compared with Gurobi in solving several benchmark ILP problems

Neural Information Processing Systems

We thank the reviewers for their valuable comments. We want to first globally address the concern on novelty in the paper. Next, we address individual comments from each reviewer. This is an important question to answer for practical relevance. Please see the general comment above.





Globally Convergent Policy Search for Output Estimation

Neural Information Processing Systems

We introduce the first direct policy search algorithm which provably converges to the globally optimal dynamic filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. Despite the ubiquity of partial observability in practice, theoretical guarantees for direct policy search algorithms, one of the backbones of modern reinforcement learning, have proven difficult to achieve. This is primarily due to the degeneracies which arise when optimizing over filters that maintain an internal state. In this paper, we provide a new perspective on this challenging problem based on the notion of informativity, which intuitively requires that all components of a filter's internal state are representative of the true state of the underlying dynamical system. We show that informativity overcomes the aforementioned degeneracy. Specifically, we propose a regularizer which explicitly enforces informativity, and establish that gradient descent on this regularized objective - combined with a "reconditioning step" - converges to the globally optimal cost at a O (1 /T) rate.


A Comparison with Other General MLCO Frameworks

Neural Information Processing Systems

We would also like to discuss the limitations of the approaches including ours. As shown in Tab. 4, the PPO-Single that serves as a baseline in our paper is designed following As shown in Tab. 4, NerRewritter is most general because it can be viewed as a learning-based local It is also worth noting that there are some problems that are beyond our knowledge to tackle, e.g. the expression simplify problem, and it may requires experts with specific domain We have discussed the model details of PPO-BiHyb in Sec. 4, and in this section, we discuss the DAG. Considering the structure of DAG, we design two GCNs: the first GCN processes the original DAG, and the second GCN processes the DAG with all edges reversed. The predicted doubly-stochastic matrix by SK is processed by considering the partial matching matrix. Graph-level features are obtained via attention pooling, which are fed to the critic net.