Goto

Collaborating Authors

 Computational Learning Theory


On Proper Learnability between Average-and Worst-case Robustness

Neural Information Processing Systems

Recently, Montasser et al. [2019] showed that finite VC dimension is not sufficient for proper adversarially robust PAC learning. In light of this hardness, there is a growing effort to study what type of relaxations to the adversarially robust PAC learning setup can enable proper learnability. In this work, we initiate the study of proper learning under relaxations of the adversarially robust loss. We give a family of robust loss relaxations under which VC classes are properly PAC learnable with sample complexity close to what one would require in the standard PAC learning setup. On the other hand, we show that for an existing and natural relaxation of the adversarially robust loss, finite VC dimension is not sufficient for proper learning. Lastly, we give new generalization guarantees for the adversarially robust empirical risk minimizer.


Uniform Convergence with Square-Root Lipschitz Loss

Neural Information Processing Systems

We establish generic uniform convergence guarantees for Gaussian data in terms of the Rademacher complexity of the hypothesis class and the Lipschitz constant of the square root of the scalar loss function. We show how these guarantees substantially generalize previous results based on smoothness (Lipschitz constant of the derivative), and allow us to handle the broader class of square-root-Lipschitz losses, which includes also non-smooth loss functions appropriate for studying phase retrieval and ReLU regression, as well as rederive and better understand "optimistic rate" and interpolation learning guarantees.


Weitzman's Rule for Pandora's Box with Correlations

Neural Information Processing Systems

In this problem we are given n boxes, each with a fixed opening cost, and an unknown value drawn from a known distribution, only revealed if we pay the opening cost. Our goal is to find a strategy for opening boxes to minimize the sum of the value selected and the opening cost paid.


An Optimal Elimination Algorithm for Learning a Best Arm Avinatan Hassidim

Neural Information Processing Systems

We consider the classic problem of (ษ›, ฮด)-PAC learning a best arm where the goal is to identify with confidence 1 ฮด an arm whose mean is an ษ›-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst case sample complexity is not well understood. In this paper we propose a new approach for (ษ›, ฮด)-PAC learning a best arm. This approach leads to an algorithm whose sample complexity converges to exactly the optimal sample complexity of (ษ›, ฮด)-learning the mean of n arms separately and we complement this result with a conditional matching lower bound.


VC_dimension_of_tensor_networks-suppl

Neural Information Processing Systems

We start with the pseudo-dimension introduced in Definition 1 . It then follows from Warren's theorem (Theorem 3) that f ( n, G) 4 en | V | N We use a symmetrization lemma and a corollary of Hoeffding's inequality. In this section, we give the proofs of all the lower bounds appearing in Table 1 . The result then directly follows from Lemma 10 . We start with the tensor train case, the tensor ring case will be handled similarly.


Private Hypothesis Selection

Neural Information Processing Systems

We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution P and a set of m probability distributions H, the goal is to output, in a ฮต-differentially private manner, a distribution from H whose total variation distance to P is comparable to that of the best such distribution (which we denote by ฮฑ).


Algorithm 2 Reduce s k Turing optimal learning to Turing optimal learning . Input learner A

Neural Information Processing Systems

The reduction used in Claim 1 to show the equivalence of Turing-optimal and (s, k)-Turing-optimal, is shown in Algorithm 2. Note that the algorithm was chosen for its simplicity rather than optimizing parameters. Finally, clearly Algorithm 2 runs in polynomial time, i.e., time poly(d, m + M) due to the timeout and number of iterations. We now move to the poof of Claim 2. Note that the Turing-optimal learner A is a PAC learner though A does not even require, as inputs. By the union bound, this means that with probability 1, it outputs a classifier with error at most as required by PAC learning. In this section we will give a complete proof of 1.


Learning Collaborative Policies to Solve NP-hard Routing Problems Minsu Kim Jinkyoo Park Joungho Kim

Neural Information Processing Systems

Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions. On the other hand, the reviser modifies each candidate solution generated by the seeder; it partitions the full trajectory into sub-tours and simultaneously revises each sub-tour to minimize its traveling distance. Thus, the reviser is trained to improve the candidate solution's quality, focusing on the reduced solution space (which is beneficial for exploitation). Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP).


Review for NeurIPS paper: Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?

Neural Information Processing Systems

Weaknesses: There have been several "proof-of-concept" papers using deep learning for SAT. This paper is an interesting paper, but yet another proof of concept, using relatively small size SAT instances. The paper falls short in terms of showing a true potential for improving the state of the art of Sat Solvers. The experimental section is limited: they mainly consider random 3SAT instances (at the phase transition, which is good) but they are relatively small instances (up to 250 variables when SAT solvers can solve considerably larger problems for satisfiable instances with thousands of variables and millions of clauses and hundreds of variables and thousands of clauses for unsat (see e.g., 2016 SAT competition)). They also consider graph coloring instances but again not very large size problems.


Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?

Neural Information Processing Systems

We present Graph-Q-SAT, a branching heuristic for a Boolean SAT solver trained with value-based reinforcement learning (RL) using Graph Neural Networks for function approximation. Solvers using Graph-Q-SAT are complete SAT solvers that either provide a satisfying assignment or proof of unsatisfiability, which is required for many SAT applications. The branching heuristics commonly used in SAT solvers make poor decisions during their warm-up period, whereas Graph-Q-SAT is trained to examine the structure of the particular problem instance to make better decisions early in the search. Training Graph-Q-SAT is data efficient and does not require elaborate dataset preparation or feature engineering. We train Graph-Q-SAT using RL interfacing with MiniSat solver and show that Graph-Q-SAT can reduce the number of iterations required to solve SAT problems by 2-3X. Furthermore, it generalizes to unsatisfiable SAT instances, as well as to problems with 5X more variables than it was trained on. We show that for larger problems, reductions in the number of iterations lead to wall clock time reductions, the ultimate goal when designing heuristics. We also show positive zero-shot transfer behavior when testing Graph-Q-SAT on a task family different from that used for training. While more work is needed to apply Graph-Q-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.