Goto

Collaborating Authors

 Search


some specific questions, but will incorporate all feedback in the final version

Neural Information Processing Systems

We thank the reviewers for their careful reading and insightful comments. We will add this in the final version. Transformer-based) models to further shrink the search space. Number of nodes in the graphs seems to be quite low ( 200 for GNMT). Is there some manual grouping operation performed on the computational graph?


Scalable Online Planning via Reinforcement Learning Fine-Tuning

Neural Information Processing Systems

Lookahead search has been a critical component of recent AI successes, such as in the games of chess, go, and poker. However, the search methods used in these games, and in many other settings, are tabular. Tabular search methods do not scale well with the size of the search space, and this problem is exacerbated by stochasticity and partial observability. In this work we replace tabular search with online model-based fine-tuning of a policy neural network via reinforcement learning, and show that this approach outperforms state-of-the-art search algorithms in benchmark settings. In particular, we use our search algorithm to achieve a new state-of-the-art result in self-play Hanabi, and show the generality of our algorithm by also showing that it outperforms tabular search in the Atari game Ms. Pacman.



Information Theoretic Regret Bounds for Online Nonlinear Control Sham Kakade

Neural Information Processing Systems

This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space. This framework yields a general setting that permits discrete and continuous control inputs as well as non-smooth, non-differentiable dynamics.




Online learning with dynamics: A minimax perspective

Neural Information Processing Systems

Given such a setup, a natural question to ask is how does one measure the performance of the learner? Classical online learning studies one such notion of performance known as regret.



Neural Topological Ordering for Computation Graphs

Neural Information Processing Systems

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc. Work completed during employment at Qualcomm Technologies, Inc. 36th Conference on Neural Information Processing Systems (NeurIPS 2022). of the Directed Acyclic Graph (DAG) that encodes the precedence constraints, which induces a Combinatorial Optimization [3] (CO) problem which is in general computationally hard [4].