Goto

Collaborating Authors

 valiant


Entropy Rate Estimation for Markov Chains with Large State Space

Neural Information Processing Systems

Entropy estimation is one of the prototypical problems in distribution property testing. To consistently estimate the Shannon entropy of a distribution on $S$ elements with independent samples, the optimal sample complexity scales sublinearly with $S$ as $\Theta(\frac{S}{\log S})$ as shown by Valiant and Valiant \cite{Valiant--Valiant2011}. Extending the theory and algorithms for entropy estimation to dependent data, this paper considers the problem of estimating the entropy rate of a stationary reversible Markov chain with $S$ states from a sample path of $n$ observations.


Entropy Rate Estimation for Markov Chains with Large State Space

Neural Information Processing Systems

Entropy estimation is one of the prototypical problems in distribution property testing. To consistently estimate the Shannon entropy of a distribution on $S$ elements with independent samples, the optimal sample complexity scales sublinearly with $S$ as $\Theta(\frac{S}{\log S})$ as shown by Valiant and Valiant \cite{Valiant--Valiant2011}. Extending the theory and algorithms for entropy estimation to dependent data, this paper considers the problem of estimating the entropy rate of a stationary reversible Markov chain with $S$ states from a sample path of $n$ observations.


Learning CNF formulas from uniform random solutions in the local lemma regime

Feng, Weiming, Yang, Xiongxin, Yu, Yixiao, Zhang, Yiyao

arXiv.org Machine Learning

We study the problem of learning a $n$-variables $k$-CNF formula $Φ$ from its i.i.d. uniform random solutions, which is equivalent to learning a Boolean Markov random field (MRF) with $k$-wise hard constraints. Revisiting Valiant's algorithm (Commun. ACM'84), we show that it can exactly learn (1) $k$-CNFs with bounded clause intersection size under Lovász local lemma type conditions, from $O(\log n)$ samples; and (2) random $k$-CNFs near the satisfiability threshold, from $\widetilde{O}(n^{\exp(-\sqrt{k})})$ samples. These results significantly improve the previous $O(n^k)$ sample complexity. We further establish new information-theoretic lower bounds on sample complexity for both exact and approximate learning from i.i.d. uniform random solutions.


What Does It Really Mean to Learn?

The New Yorker

I read "Middlemarch" for the first time during my sophomore year of college. Why would Dorothea, a young and intelligent woman, marry that annoying old man? How could she be so stupid? No one else in the class seemed to get it, either, and this pushed our professor over the edge. "Of course you don't understand," he roared, swilling a Diet Coke.


The Perceptron Algorithm Is Fast for Non-Malicious Distributions

Neural Information Processing Systems

Within the context of Valiant's protocol for learning, the Perceptron algorithm is shown to learn an arbitrary half-space in time O(r;;) if D, the proba(cid:173) bility distribution of examples, is taken uniform over the unit sphere sn. Here f is the accuracy parameter. This is surprisingly fast, as "standard" approaches involve solution of a linear programming problem involving O( 7') constraints in n dimen(cid:173) sions. A modification of Valiant's distribution independent protocol for learning is proposed in which the distribution and the function to be learned may be cho(cid:173) sen by adversaries, however these adversaries may not communicate. It is argued that this definition is more reasonable and applicable to real world learning than Valiant's.


Agnostic PAC-Learning of Functions on Analog Neural Nets

Neural Information Processing Systems

There exist a number of negative results ([J), [BR), [KV]) about learning on neural nets in Valiant's model [V) for probably approx(cid:173) imately correct learning ("PAC-learning"). These negative results are based on an asymptotic analysis where one lets the number of nodes in the neural net go to infinit.y. Hence this analysis is less ad(cid:173) equate for the investigation of learning on a small fixed neural net. The latter type of learning problem gives rise to a different kind of asymptotic question: Can the true error of the neural net be brought arbitrarily close to that of a neural net with "optimal" weights through sufficiently long training? In this paper we employ some new arguments ill order to give a positive answer to this question in Haussler's rather realistic refinement of Valiant's model for PAC-learning ([H), [KSS)). In this more realistic model no a-priori assumptions are required about the "learning target", noise is permitted in the training data, and the inputs and outputs are not restricted to boolean values.


On the Evolvability of Monotone Conjunctions with an Evolutionary Mutation Mechanism

Diochnos, Dimitrios

Journal of Artificial Intelligence Research

Valiant (2009) introduced a framework for a quantitative approach to evolution, called evolvability. The idea is, roughly, that there is an ideal behavior in every environment and the feedback that the various organisms receive during evolution indicates how close their behavior is to ideal. Ultimately, evolvability aims at modeling and explaining mechanisms that allow near-optimal behavior of organisms while exploiting realistic computational resources. Due to a result by Feldman (2008), evolvability is equivalent to learning in the correlational statistical query (CSQ) model (Bshouty & Feldman, 2002). Thus, evolvability algorithms correspond to a special type of local search learning algorithms that fall under the umbrella of the probably approximately correct (PAC) model of learning (Valiant, 1984).


Working Memory for Online Memory Binding Tasks: A Hybrid Model

Yazdi, Seyed Mohammad Mahdi Heidarpoor, Abbassian, Abdolhossein

arXiv.org Artificial Intelligence

Working Memory is the brain module that holds and manipulates information online. In this work, we design a hybrid model in which a simple feed-forward network is coupled to a balanced random network via a read-write vector called the interface vector. First, we consider some simple memory binding tasks in which the output is set to be a copy of the given input and a selective sequence of previous inputs online. Next, we design a more complex binding task based on a cue that encodes binding relations. The important result is that our dual-component model of working memory shows good performance with learning restricted to the feed-forward component only. Here we take advantage of the random network property without learning. To our knowledge, this is the first time that random networks as a flexible memory is shown to play an important role in online binding tasks. We may interpret our results as a candidate model of working memory in which the feed-forward network learns to interact with the temporary storage random network as an attentional-controlling executive system.


Entropy Rate Estimation for Markov Chains with Large State Space

Han, Yanjun, Jiao, Jiantao, Lee, Chuan-Zheng, Weissman, Tsachy, Wu, Yihong, Yu, Tiancheng

Neural Information Processing Systems

Entropy estimation is one of the prototypical problems in distribution property testing. To consistently estimate the Shannon entropy of a distribution on $S$ elements with independent samples, the optimal sample complexity scales sublinearly with $S$ as $\Theta(\frac{S}{\log S})$ as shown by Valiant and Valiant \cite{Valiant--Valiant2011}. Extending the theory and algorithms for entropy estimation to dependent data, this paper considers the problem of estimating the entropy rate of a stationary reversible Markov chain with $S$ states from a sample path of $n$ observations. In comparison, the empirical entropy rate requires at least $\Omega(S 2)$ samples to be consistent, even when the Markov chain is memoryless. In addition to synthetic experiments, we also apply the estimators that achieve the optimal sample complexity to estimate the entropy rate of the English language in the Penn Treebank and the Google One Billion Words corpora, which provides a natural benchmark for language modeling and relates it directly to the widely used perplexity measure.


Query-driven PAC-Learning for Reasoning

Juba, Brendan

arXiv.org Artificial Intelligence

We consider the problem of learning rules from a data set that support a proof of a given query, under Valiant's PAC-Semantics. We show how any backward proof search algorithm that is sufficiently oblivious to the contents of its knowledge base can be modified to learn such rules while it searches for a proof using those rules. We note that this gives such algorithms for standard logics such as chaining and resolution.