Computational Learning Theory
Summarizing Event Sequences with Serial Episodes: A Statistical Model and an Application
In this paper we address the problem of discovering a small set of frequent serial episodes from sequential data so as to adequately characterize or summarize the data. We discuss an algorithm based on the Minimum Description Length (MDL) principle and the algorithm is a slight modification of an earlier method, called CSC-2. We present a novel generative model for sequence data containing prominent pairs of serial episodes and, using this, provide some statistical justification for the algorithm. We believe this is the first instance of such a statistical justification for an MDL based algorithm for summarizing event sequence data. We then present a novel application of this data mining algorithm in text classification. By considering text documents as temporal sequences of words, the data mining algorithm can find a set of characteristic episodes for all the training data as a whole. The words that are part of these characteristic episodes could then be considered the only relevant words for the dictionary thus resulting in a considerably reduced feature vector dimension. We show, through simulation experiments using benchmark data sets, that the discovered frequent episodes can be used to achieve more than four-fold reduction in dictionary size without losing any classification accuracy.
A Learning Framework for Distribution-Based Game-Theoretic Solution Concepts
The past few years have seen several works establishing PAC frameworks for solving various problems in economic domains; these include optimal auction design, approximate optima of submodular functions, stable partitions and payoff divisions in cooperative games and more. In this work, we provide a unified learning-theoretic methodology for modeling these problems, and establish some useful tools for determining whether a given economic solution concept can be learned from data. Our learning theoretic framework generalizes a notion of function space dimension --- the graph dimension --- adapting it to the solution concept learning domain. We identify sufficient conditions for the PAC learnability of solution concepts, and show that results in existing works can be immediately derived using our general methodology. Finally, we apply our methods in other economic domains, yielding a novel notion of PAC competitive equilibrium and PAC Condorcet winners.
Optimal Collusion-Free Teaching
Kirkpatrick, David, Simon, Hans U., Zilles, Sandra
Formal models of learning from teachers need to respect certain criteria to avoid collusion. The most commonly accepted notion of collusion-freeness was proposed by Goldman and Mathias (1996), and various teaching models obeying their criterion have been studied. For each model $M$ and each concept class $\mathcal{C}$, a parameter $M$-$\mathrm{TD}(\mathcal{C})$ refers to the teaching dimension of concept class $\mathcal{C}$ in model $M$---defined to be the number of examples required for teaching a concept, in the worst case over all concepts in $\mathcal{C}$. This paper introduces a new model of teaching, called no-clash teaching, together with the corresponding parameter $\mathrm{NCTD}(\mathcal{C})$. No-clash teaching is provably optimal in the strong sense that, given any concept class $\mathcal{C}$ and any model $M$ obeying Goldman and Mathias's collusion-freeness criterion, one obtains $\mathrm{NCTD}(\mathcal{C})\le M$-$\mathrm{TD}(\mathcal{C})$. We also study a corresponding notion $\mathrm{NCTD}^+$ for the case of learning from positive data only, establish useful bounds on $\mathrm{NCTD}$ and $\mathrm{NCTD}^+$, and discuss relations of these parameters to the VC-dimension and to sample compression. In addition to formulating an optimal model of collusion-free teaching, our main results are on the computational complexity of deciding whether $\mathrm{NCTD}^+(\mathcal{C})=k$ (or $\mathrm{NCTD}(\mathcal{C})=k$) for given $\mathcal{C}$ and $k$. We show some such decision problems to be equivalent to the existence question for certain constrained matchings in bipartite graphs. Our NP-hardness results for the latter are of independent interest in the study of constrained graph matchings.
Improved Generalization Bounds for Robust Learning
Attias, Idan, Kontorovich, Aryeh, Mansour, Yishay
We consider a model of robust learning in an adversarial environment. The learner gets uncorrupted training data with access to possible corruptions that may be affected by the adversary during testing. The learner's goal is to build a robust classifier that would be tested on future adversarial examples. We use a zero-sum game between the learner and the adversary as our game theoretic framework. The adversary is limited to $k$ possible corruptions for each input. Our model is closely related to the adversarial examples model of Schmidt et al. (2018); Madry et al. (2017). Our main results consist of generalization bounds for the binary and multi-class classification, as well as the real-valued case (regression). For the binary classification setting, we both tighten the generalization bound of Feige, Mansour, and Schapire (2015), and also are able to handle an infinite hypothesis class $H$. The sample complexity is improved from $O(\frac{1}{\epsilon^4}\log(\frac{|H|}{\delta}))$ to $O(\frac{1}{\epsilon^2}(k\log(k)VC(H)+\log\frac{1}{\delta}))$. Additionally, we extend the algorithm and generalization bound from the binary to the multiclass and real-valued cases. Along the way, we obtain results on fat-shattering dimension and Rademacher complexity of $k$-fold maxima over function classes; these may be of independent interest. For binary classification, the algorithm of Feige et al. (2015) uses a regret minimization algorithm and an ERM oracle as a blackbox; we adapt it for the multi-class and regression settings. The algorithm provides us with near-optimal policies for the players on a given training sample.
From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model
Saha, Aadirupa, Gopalan, Aditya
We consider PAC learning for identifying a good item from subset-wise samples in \pl\, probability models, with instance-dependent sample complexity performance. For the setting where subsets of a fixed size can be tested and top-ranked feedback is made available to the learner each time, we give the first $(\epsilon,\delta)$-PAC best item algorithm with an instance-dependent sample complexity bound. The algorithm relies on a wrapper that uses a weaker PAC algorithm with worst-case performance guarantees to adapt to the hardness of the input instance. The sample complexity is shown to be multiplicatively better depending on the length of rank-ordered feedback available in each subset play. We also give a new fixed-budget best-item algorithm for the \pl\, model along with an error bound. Numerical results of simulations of the algorithms are reported.
Private Center Points and Learning of Halfspaces
Beimel, Amos, Moran, Shay, Nissim, Kobbi, Stemmer, Uri
We present a private learner for halfspaces over an arbitrary finite domain $X\subset \mathbb{R}^d$ with sample complexity $mathrm{poly}(d,2^{\log^*|X|})$. The building block for this learner is a differentially private algorithm for locating an approximate center point of $m>\mathrm{poly}(d,2^{\log^*|X|})$ points -- a high dimensional generalization of the median function. Our construction establishes a relationship between these two problems that is reminiscent of the relation between the median and learning one-dimensional thresholds [Bun et al.\ FOCS '15]. This relationship suggests that the problem of privately locating a center point may have further applications in the design of differentially private algorithms. We also provide a lower bound on the sample complexity for privately finding a point in the convex hull. For approximate differential privacy, we show a lower bound of $m=\Omega(d+\log^*|X|)$, whereas for pure differential privacy $m=\Omega(d\log|X|)$.
Community-based 3-SAT Formulas with a Predefined Solution
Hu, Yamin, Luo, Wenjian, Wang, Junteng
It is crucial to generate crafted SAT formulas with predefined solutions for the testing and development of SAT solvers since many SAT formulas from real-world applications have solutions. Although some generating algorithms have been proposed to generate SAT formulas with predefined solutions, community structures of SAT formulas are not considered. We propose a 3-SAT formula generating algorithm that not only guarantees the existence of a predefined solution, but also simultaneously considers community structures and clause distributions. The proposed 3-SAT formula generating algorithm controls the quality of community structures through controlling (1) the number of clauses whose variables have a common community, which we call intra-community clauses, and (2) the number of variables that only belong to one community, which we call intra-community variables. To study the combined effect of community structures and clause distributions on the hardness of SAT formulas, we measure solving runtimes of two solvers, gluHack (a leading CDCL solver) and CPSparrow (a leading SLS solver), on the generated SAT formulas under different groups of parameter settings. Through extensive experiments, we obtain some noteworthy observations on the SAT formulas generated by the proposed algorithm: (1) The community structure has little or no effects on the hardness of SAT formulas with regard to CPSparrow but a strong effect with regard to gluHack. (2) Only when the proportion of true literals in a SAT formula in terms of the predefined solution is 0.5, SAT formulas are hard-to-solve with regard to gluHack; when this proportion is below 0.5, SAT formulas are hard-to-solve with regard to CPSparrow. (3) When the ratio of the number of clauses to that of variables is around 4.25, the SAT formulas are hard-to-solve with regard to both gluHack and CPSparrow.
Understanding Database Reconstruction Attacks on Public Data
There exists a solution universe of all the possible solutions to this set of constraints. If the solution universe contains a single possible solution, then the published statistics completely reveal the underlying confidential data--provided that noise was not added to either the microdata or the tabulations as a disclosure-avoidance mechanism. If there are multiple satisfying solutions, then any element (person) in common among all of the solutions is revealed. If the equations have no solution, either the set of published statistics is inconsistent with the fictional statistical agency's claim that it is tabulated from a real confidential database or an error was made in that tabulation. This doesn't mean that a high-quality reconstruction is not possible.
Where Do Human Heuristics Come From?
Human decision-making deviates from the optimal solution, that maximizes cumulative rewards, in many situations. Here we approach this discrepancy from the perspective of bounded rationality and our goal is to provide a justification for such seemingly sub-optimal strategies. More specifically we investigate the hypothesis, that humans do not know optimal decision-making algorithms in advance, but instead employ a learned, resource-bounded approximation. The idea is formalized through combining a recently proposed meta-learning model based on Recurrent Neural Networks with a resource-bounded objective. The resulting approach is closely connected to variational inference and the Minimum Description Length principle. Empirical evidence is obtained from a two-armed bandit task. Here we observe patterns in our family of models that resemble differences between individual human participants.
Differentially Private Learning of Geometric Concepts
Kaplan, Haim, Mansour, Yishay, Matias, Yossi, Stemmer, Uri
Machine learning algorithms have exciting and wide-range potential. However, as the data frequently containsensitive personal information, there are real privacy concerns associated with the development and the deployment of this technology. Motivated by this observation, the line of work on differentially private learning (initiated by [23]) aims to construct learning algorithms that provide strong (mathematically proven) privacy protections for the training data. Both government agenciesand industrial companies have realized the importance of introducing strong privacy protection to statistical and machine learning tasks. A few recent examples include Google [20] and Apple [27] that are already using differentially private estimation algorithms that feed into machine learning algorithms, and the US Census Bureau announcement that they will use differentially privatedata publication techniques in the next decennial census [1]. Differential privacy is increasingly accepted as a standard for rigorous privacy. We refer the reader to the excellent surveys in [17] and [28]. The definition of differential privacy is, Definition 1.1 ([16]). Let A be a randomized algorithm whose input is a sample.