valiant
What is Learnable in Valiant's Theory of the Learnable?
Hanneke, Steve, Mehrotra, Anay, Velegkas, Grigoris, Zampetakis, Manolis
Valiant's 1984 paper is widely credited with introducing the PAC learning model, but it, in fact, introduced a different model: unlike PAC learning, the learner receives only positives, may issue membership queries, and must output a hypothesis with no false positives. Prior work characterized variants, including the case without queries. We revisit Valiant's original model and ask: *Which classes are learnable in it?* For every finite domain, including Valiant's Boolean-hypercube setting, we show that a class is learnable if and only if every realizable positive sample can be certified by a poly-size adaptive query-compression scheme. This is a new variant of sample compression where the learner certifies samples via a short interaction with the membership oracle. Our characterization shows that learnability in Valiant's model is strictly sandwiched between learnability in the PAC model and the variant of Valiant's model without membership queries. This is one of the rare cases where introducing membership queries changes the set of learnable classes, and not just the sample or computational complexity. Next, we study the natural extension of the model to arbitrary domains. While we do not obtain an exact characterization, our techniques readily generalize and show that the same strict sandwiching persists. Finally, we show that $d$-dimensional halfspaces, which are not learnable without queries, are learnable with queries: we give a $\mathrm{poly}(d) \tilde{O}(1/ฮต)$ sample and $\mathrm{poly}(d) \mathrm{polylog}(1/ฮต)$ query algorithm, and prove that at least $ฮฉ(d)$ samples or queries are necessary. To our knowledge, this is the first algorithm for halfspaces in Valiant's model. Together, these results uncover a surprisingly rich theory behind Valiant's original notion of learnability and introduce ideas that may be of independent interest in learning theory.
Entropy Rate Estimation for Markov Chains with Large State Space
Entropy estimation is one of the prototypical problems in distribution property testing. To consistently estimate the Shannon entropy of a distribution on $S$ elements with independent samples, the optimal sample complexity scales sublinearly with $S$ as $\Theta(\frac{S}{\log S})$ as shown by Valiant and Valiant \cite{Valiant--Valiant2011}. Extending the theory and algorithms for entropy estimation to dependent data, this paper considers the problem of estimating the entropy rate of a stationary reversible Markov chain with $S$ states from a sample path of $n$ observations.
Entropy Rate Estimation for Markov Chains with Large State Space
Entropy estimation is one of the prototypical problems in distribution property testing. To consistently estimate the Shannon entropy of a distribution on $S$ elements with independent samples, the optimal sample complexity scales sublinearly with $S$ as $\Theta(\frac{S}{\log S})$ as shown by Valiant and Valiant \cite{Valiant--Valiant2011}. Extending the theory and algorithms for entropy estimation to dependent data, this paper considers the problem of estimating the entropy rate of a stationary reversible Markov chain with $S$ states from a sample path of $n$ observations.
Learning CNF formulas from uniform random solutions in the local lemma regime
Feng, Weiming, Yang, Xiongxin, Yu, Yixiao, Zhang, Yiyao
We study the problem of learning a $n$-variables $k$-CNF formula $ฮฆ$ from its i.i.d. uniform random solutions, which is equivalent to learning a Boolean Markov random field (MRF) with $k$-wise hard constraints. Revisiting Valiant's algorithm (Commun. ACM'84), we show that it can exactly learn (1) $k$-CNFs with bounded clause intersection size under Lovรกsz local lemma type conditions, from $O(\log n)$ samples; and (2) random $k$-CNFs near the satisfiability threshold, from $\widetilde{O}(n^{\exp(-\sqrt{k})})$ samples. These results significantly improve the previous $O(n^k)$ sample complexity. We further establish new information-theoretic lower bounds on sample complexity for both exact and approximate learning from i.i.d. uniform random solutions.
What Does It Really Mean to Learn?
I read "Middlemarch" for the first time during my sophomore year of college. Why would Dorothea, a young and intelligent woman, marry that annoying old man? How could she be so stupid? No one else in the class seemed to get it, either, and this pushed our professor over the edge. "Of course you don't understand," he roared, swilling a Diet Coke.
The Perceptron Algorithm Is Fast for Non-Malicious Distributions
Within the context of Valiant's protocol for learning, the Perceptron algorithm is shown to learn an arbitrary half-space in time O(r;;) if D, the proba(cid:173) bility distribution of examples, is taken uniform over the unit sphere sn. Here f is the accuracy parameter. This is surprisingly fast, as "standard" approaches involve solution of a linear programming problem involving O( 7') constraints in n dimen(cid:173) sions. A modification of Valiant's distribution independent protocol for learning is proposed in which the distribution and the function to be learned may be cho(cid:173) sen by adversaries, however these adversaries may not communicate. It is argued that this definition is more reasonable and applicable to real world learning than Valiant's.
Agnostic PAC-Learning of Functions on Analog Neural Nets
There exist a number of negative results ([J), [BR), [KV]) about learning on neural nets in Valiant's model [V) for probably approx(cid:173) imately correct learning ("PAC-learning"). These negative results are based on an asymptotic analysis where one lets the number of nodes in the neural net go to infinit.y. Hence this analysis is less ad(cid:173) equate for the investigation of learning on a small fixed neural net. The latter type of learning problem gives rise to a different kind of asymptotic question: Can the true error of the neural net be brought arbitrarily close to that of a neural net with "optimal" weights through sufficiently long training? In this paper we employ some new arguments ill order to give a positive answer to this question in Haussler's rather realistic refinement of Valiant's model for PAC-learning ([H), [KSS)). In this more realistic model no a-priori assumptions are required about the "learning target", noise is permitted in the training data, and the inputs and outputs are not restricted to boolean values.