Supervised Learning
Strategic Classification under Unknown Personalized Manipulation
We study the fundamental mistake bound and sample complexity in the strategic classification, where agents can strategically manipulate their feature vector up to an extent in order to be predicted as positive. For example, given a classifier determining college admission, student candidates may try to take easier classes to improve their GPA, retake SAT and change schools in an effort to fool the classifier. Ball manipulations are a widely studied class of manipulations in the literature, where agents can modify their feature vector within a bounded radius ball. Unlike most prior work, our work considers manipulations to be personalized, meaning that agents can have different levels of manipulation abilities (e.g., varying radii for ball manipulations), and unknown to the learner. We formalize the learning problem in an interaction model where the learner first deploys a classifier and the agent manipulates the feature vector within their manipulation set to game the deployed classifier. We investigate various scenarios in terms of the information available to the learner during the interaction, such as observing the original feature vector before or after deployment, observing the manipulated feature vector, or not seeing either the original or the manipulated feature vector. We begin by providing online mistake bounds and PAC sample complexity in these scenarios for ball manipulations. We also explore non-ball manipulations and show that, even in the simplest scenario where both the original and the manipulated feature vectors are revealed, the mistake bounds and sample complexity are lower bounded by ฮฉ(|H|) when the target function belongs to a known class H.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
This paper proposes a new regularization method for structured prediction. The idea is relatively straightforward: a linear chain model is segmented into smaller subchains, each of which is added as an independent training example. Theorems are provided (with proofs in the supplement) showing how this regularization can reduce generalization risk and accelerate convergence rates. Empirical comparisons with state of the art approaches suggest that the resulting method is both faster and more accurate. The accuracy improvements are small, but these are all well-studied tasks where small improvements can have impact.
Clone-Resistant Weights in Metric Spaces: A Framework for Handling Redundancy Bias
Berriaud, Damien, Wattenhofer, Roger
We are given a set of elements in a metric space. The distribution of the elements is arbitrary, possibly adversarial. Can we weigh the elements in a way that is resistant to such (adversarial) manipulations? This problem arises in various contexts. For instance, the elements could represent data points, requiring robust domain adaptation. Alternatively, they might represent tasks to be aggregated into a benchmark; or questions about personal political opinions in voting advice applications. This article introduces a theoretical framework for dealing with such problems. We propose clone-proof representation functions as a solution concept. These functions distribute importance across elements of a set such that similar objects (``clones'') share (some of) their weights, thus avoiding a potential bias introduced by their multiplicity. Our framework extends the maximum uncertainty principle to accommodate general metric spaces and includes a set of axioms - symmetry, continuity, and clone-proofness - that guide the construction of representation functions. Finally, we address the existence of representation functions satisfying our axioms in the significant case of Euclidean spaces and propose a general method for their construction.
Reviews: Graph Structured Prediction Energy Networks
Based on structured SVM, the authors combine the structured prediction and the learning using hinge loss, the results is a novel model, Graph Structured Prediction Energy Networks. Overall the model is novel and the theory is mostly solid. However, I have some concerns about the inference part. 1. Marginal Polytope. The relaxation of the marginal polytope is always tricky for structured prediction. A loose relaxation might result in an efficient algorithm, but the bad quality of the solution.
Reviews: Graph Structured Prediction Energy Networks
All the reviewers thought that generalizing the structured prediction energy network (SPEN) to incorporate factored potentials (following graph structure) with proposed approximate inference schemes for structured prediction make a nice contribution to NeurIPS. The extensive experiments were lauded, but concerns were expressed with the theoretical backing of the methods. After discussion and looking at the paper, the AC agrees with R2 that the paper makes an interesting practical contribution, and that the theory could be clarified in follow-up work. The authors should include their timing results as well as additional clarification from the rebuttal in their camera ready version. Additional side notes: - [*] from the rebuttal should be mentioned in the main paper as a way to handle the entropy term over the marginal polytope in a principled manner with Frank-Wolfe.
Invariant Kernels: Rank Stabilization and Generalization Across Dimensions
Dรญaz, Mateo, Drusvyatskiy, Dmitriy, Kendrick, Jack, Thomas, Rekha R.
Symmetry arises often when learning from high dimensional data. For example, data sets consisting of point clouds, graphs, and unordered sets appear routinely in contemporary applications, and exhibit rich underlying symmetries. Understanding the benefits of symmetry on the statistical and numerical efficiency of learning algorithms is an active area of research. In this work, we show that symmetry has a pronounced impact on the rank of kernel matrices. Specifically, we compute the rank of a polynomial kernel of fixed degree that is invariant under various groups acting independently on its two arguments. In concrete circumstances, including the three aforementioned examples, symmetry dramatically decreases the rank making it independent of the data dimension. In such settings, we show that a simple regression procedure is minimax optimal for estimating an invariant polynomial from finitely many samples drawn across different dimensions. We complete the paper with numerical experiments that illustrate our findings.
Learning Strategy-Aware Linear Classifiers
We address the question of repeatedly learning linear classifiers against agents who are strategically trying to game the deployed classifiers, and we use the Stackelberg regret to measure the performance of our algorithms. First, we show that Stackelberg and external regret for the problem of strategic classification are strongly incompatible: i.e., there exist worst-case scenarios, where any sequence of actions providing sublinear external regret might result in linear Stackelberg regret and vice versa. Second, we present a strategy-aware algorithm for minimizing the Stackelberg regret for which we prove nearly matching upper and lower regret bounds. Finally, we provide simulations to complement our theoretical analysis. Our results advance the growing literature of learning from revealed preferences, which has so far focused on "smoother" assumptions from the perspective of the learner and the agents respectively.
An Efficient and Robust Framework for Approximate Nearest Neighbor Search with Attribute Constraint
This paper introduces an efficient and robust framework for hybrid query (HQ) processing, which combines approximate nearest neighbor search (ANNS) with attribute constraint. HQ aims to find objects that are similar to a feature vector and match some structured attributes. Existing methods handle ANNS and attribute filtering separately, leading to inefficiency and inaccuracy. Our framework, called native hybrid query (NHQ), builds a composite index based on proximity graph (PG) and applies joint pruning for HQ. We can easily adapt existing PGs to this framework for efficient HQ processing. We also propose two new navigable PGs (NPGs) with optimized edge selection and routing, which improve the overall ANNS performance. We implement five HQ methods based on the proposed NPGs and existing PGs in NHQ, and show that they outperform the state-of-the-art methods on 10 real-world datasets (up to 315 faster with the same accuracy).
Moment Distributionally Robust Tree Structured Prediction
Structured prediction of tree-shaped objects is heavily studied under the name of syntactic dependency parsing. Current practice based on maximum likelihood or margin is either agnostic to or inconsistent with the evaluation loss. Risk minimization alleviates the discrepancy between training and test objectives but typically induces a non-convex problem. These approaches adopt explicit regularization to combat overfitting without probabilistic interpretation. We propose a momentbased distributionally robust optimization approach for tree structured prediction, where the worst-case expected loss over a set of distributions within bounded moment divergence from the empirical distribution is minimized. We develop efficient algorithms for arborescences and other variants of trees. We derive Fisher consistency, convergence rates and generalization bounds for our proposed method. We evaluate its empirical effectiveness on dependency parsing benchmarks.
Search-Guided, Lightly-Supervised Training of Structured Prediction Energy Networks
In structured output prediction tasks, labeling ground-truth training output is often expensive. However, for many tasks, even when the true output is unknown, we can evaluate predictions using a scalar reward function, which may be easily assembled from human knowledge or non-differentiable pipelines. But searching through the entire output space to find the best output with respect to this reward function is typically intractable. In this paper, we instead use efficient truncated randomized search in this reward function to train structured prediction energy networks (SPENs), which provide efficient test-time inference using gradientbased search on a smooth, learned representation of the score landscape, and have previously yielded state-of-the-art results in structured prediction. In particular, this truncated randomized search in the reward function yields previously unknown local improvements, providing effective supervision to SPENs, avoiding their traditional need for labeled training data.