Chepoi, Victor
Sample compression schemes for balls in graphs
Chalopin, Jérémie, Chepoi, Victor, Inerney, Fionn Mc, Ratel, Sébastien, Vaxès, Yann
One of the open problems in machine learning is whether any set-family of VC-dimension $d$ admits a sample compression scheme of size $O(d)$. In this paper, we study this problem for balls in graphs. For a ball $B=B_r(x)$ of a graph $G=(V,E)$, a realizable sample for $B$ is a signed subset $X=(X^+,X^-)$ of $V$ such that $B$ contains $X^+$ and is disjoint from $X^-$. A proper sample compression scheme of size $k$ consists of a compressor and a reconstructor. The compressor maps any realizable sample $X$ to a subsample $X'$ of size at most $k$. The reconstructor maps each such subsample $X'$ to a ball $B'$ of $G$ such that $B'$ includes $X^+$ and is disjoint from $X^-$. For balls of arbitrary radius $r$, we design proper labeled sample compression schemes of size $2$ for trees, of size $3$ for cycles, of size $4$ for interval graphs, of size $6$ for trees of cycles, and of size $22$ for cube-free median graphs. For balls of a given radius, we design proper labeled sample compression schemes of size $2$ for trees and of size $4$ for interval graphs. We also design approximate sample compression schemes of size 2 for balls of $\delta$-hyperbolic graphs.
Labeled sample compression schemes for complexes of oriented matroids
Chepoi, Victor, Knauer, Kolja, Philibert, Manon
Littlestone and Warmuth [51] introduced sample compression schemes as an abstraction of the underlying structure of learning algorithms. Roughly, the aim of a sample compression scheme is to compress samples of a concept class (i.e., of a set system) C as much as possible, such that data coherent with the original samples can be reconstructed from the compressed data. There are two types of sample compression schemes: labeled, see [35, 51] and unlabeled, see [7, 34, 49]. A labeled compression scheme of size k compresses every sample of C to a labeled subsample of size at most k and an unlabeled compression scheme of size k compresses every sample of C to a subset of size at most k of the domain of the sample (see the end of the introduction for precise definitions). The Vapnik-Chervonenkis dimension (VC-dimension) of a set system, was introduced by [69] as a complexity measure of set systems. VC-dimension is central in PAC-learning and plays an important role in combinatorics, algorithmics, discrete geometry, and combinatorial optimization. In particular, it coincides with the rank in the theory of (complexes of) oriented matroids. Furthermore, within machine learning and closely tied to the topic of this paper, the sample compression conjecture of [35] and [51] states that any set system of VC-dimension d has a labeled sample compression scheme of size O(d). This question remains one of the oldest open problems in computational learning theory.