Supervised Learning
Stochastic Structured Prediction under Bandit Feedback
Sokolov, Artem, Kreutzer, Julia, Riezler, Stefan, Lo, Christopher
Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analyze them as stochastic first-order methods. We present an experimental evaluation on problems of natural language processing over exponential output spaces, and compare convergence speed across different objectives under the practical criterion of optimal task performance on development data and the optimization-theoretic criterion of minimal squared gradient norm. Best results under both criteria are obtained for a non-convex objective for pairwise preference learning under bandit feedback. Papers published at the Neural Information Processing Systems Conference.
Weakly-supervised Discovery of Visual Pattern Configurations
Song, Hyun Oh, Lee, Yong Jae, Jegelka, Stefanie, Darrell, Trevor
The prominence of weakly labeled data gives rise to a growing demand for object detection methods that can cope with minimal supervision. We propose an approach that automatically identifies discriminative configurations of visual patterns that are characteristic of a given object class. We formulate the problem as a constrained submodular optimization problem and demonstrate the benefits of the discovered configurations in remedying mislocalizations and finding informative positive and negative training examples. Papers published at the Neural Information Processing Systems Conference.
Active Nearest-Neighbor Learning in Metric Spaces
Kontorovich, Aryeh, Sabato, Sivan, Urner, Ruth
We propose a pool-based non-parametric active learning algorithm for general metric spaces, called MArgin Regularized Metric Active Nearest Neighbor (MARMANN), which outputs a nearest-neighbor classifier. We give prediction error guarantees that depend on the noisy-margin properties of the input sample, and are competitive with those obtained by previously proposed passive learners. We prove that the label complexity of MARMANN is significantly lower than that of any passive learner with similar error guarantees. Our algorithm is based on a generalized sample compression scheme and a new label-efficient active model-selection procedure. Papers published at the Neural Information Processing Systems Conference.
On Structured Prediction Theory with Calibrated Convex Surrogate Losses
Osokin, Anton, Bach, Francis, Lacoste-Julien, Simon
We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for structured prediction. Papers published at the Neural Information Processing Systems Conference.
Object Localization based on Structural SVM using Privileged Information
Feyereisl, Jan, Kwak, Suha, Son, Jeany, Han, Bohyung
We propose a structured prediction algorithm for object localization based on Support Vector Machines (SVMs) using privileged information. Privileged information provides useful high-level knowledge for image understanding and facilitates learning a reliable model even with a small number of training examples. In our setting, we assume that such information is available only at training time since it may be difficult to obtain from visual data accurately without human supervision. Our goal is to improve performance by incorporating privileged information into ordinary learning framework and adjusting model parameters for better generalization. We tackle object localization problem based on a novel structural SVM using privileged information, where an alternating loss-augmented inference procedure is employed to handle the term in the objective function corresponding to privileged information.
A General Framework for Consistent Structured Prediction with Implicit Loss Embeddings
Ciliberto, Carlo, Rosasco, Lorenzo, Rudi, Alessandro
We propose and analyze a novel theoretical and algorithmic framework for structured prediction. While so far the term has referred to discrete output spaces, here we consider more general settings, such as manifolds or spaces of probability measures. We define structured prediction as a problem where the output space lacks a vectorial structure. We identify and study a large class of loss functions that implicitly defines a suitable geometry on the problem. The latter is the key to develop an algorithmic framework amenable to a sharp statistical analysis and yielding efficient computations. When dealing with output spaces with infinite cardinality, a suitable implicit formulation of the estimator is shown to be crucial.
The exponentially weighted average forecaster in geodesic spaces of non-positive curvature
The problem of prediction with expert advice [ Cesa-Bianchi and Lugosi, 2006 ] is a by now standard model of online learning. Traditionally studied for outcom es taking values in a vector space, less seems to be known when the outcome space is a more general metr ic space. This paper partly addresses the problem by focusing on the case of NPC spaces, i .e., geodesic metric spaces with non-positive curvature in the sense of Alexandrov. The class of NPC spaces includes many metric spaces of partic ular interest in the data sciences. Apart from Hilbert spaces, interesting examples are hyperb olic spaces [ Nickel and Kiela, 2017 ], the space of real symmetric positive-definite matrices with Log -Euclidean [ Arsigny et al., 2007 ] or Log-Cholesky [ Lin, 2019 ] Riemannian metrics and more generally all complete and sim ply connected Riemannian manifolds with non-positive sectional curvatu re.
Simple and Effective Graph Autoencoders with One-Hop Linear Models
Salha, Guillaume, Hennequin, Romain, Vazirgiannis, Michalis
Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged as powerful node embedding methods, with promising performances on challenging tasks such as link prediction and node clustering. Graph AE, VAE and most of their extensions rely on graph convolutional networks (GCN) encoders to learn vector space representations of nodes. In this paper, we propose to replace the GCN encoder by a significantly simpler linear model w.r.t. the direct neighborhood (one-hop) adjacency matrix of the graph. For the two aforementioned tasks, we show that this approach consistently reaches competitive performances w.r.t. GCN-based models for numerous real-world graphs, including all benchmark datasets commonly used to evaluate graph AE and VAE. We question the relevance of repeatedly using these datasets to compare complex graph AE and VAE. We also emphasize the effectiveness of the proposed encoding scheme, that appears as a simpler and faster alternative to GCN encoders for many real-world applications.
Aggregation over Metric Spaces: Proposing and Voting in Elections, Budgeting, and Legislation
Bulteau, Laurent, Shahaf, Gal, Shapiro, Ehud, Talmon, Nimrod
We present a unifying framework encompassing many social choice settings. Viewing each social choice setting as voting in a suitable metric space, we consider a general model of social choice over metric spaces, in which---similarly to the spatial model of elections---each voter specifies an ideal element of the metric space. The ideal element functions as a vote, where each voter prefers elements that are closer to her ideal element. But it also functions as a proposal, thus making all participants equal not only as voters but also as proposers. We consider Condorcet aggregation and a continuum of solution concepts, ranging from minimizing the sum of distances to minimizing the maximum distance. We study applications of the abstract model to various social choice settings, including single-winner elections, committee elections, participatory budgeting, and participatory legislation. For each setting, we compare each solution concept to known voting rules and study various properties of the resulting voting rules. Our framework provides expressive aggregation for a broad range of social choice settings while remaining simple for voters, and may enable a unified and integrated implementation for all these settings, as well as unified extensions such as sybil-resiliency, proxy voting, and deliberative decision making.
LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction
Niculae, Vlad, Martins, André F. T.
Structured prediction requires manipulating a large number of combinatorial structures, e.g., dependency trees or alignments, either as latent or output variables. Recently, the SparseMAP method has been proposed as a differentiable, sparse alternative to maximum a posteriori (MAP) and marginal inference. SparseMAP returns a combination of a small number of structures, a desirable property in some downstream applications. However, SparseMAP requires a tractable MAP inference oracle. This excludes, e.g., loopy graphical models or factor graphs with logic constraints, which generally require approximate inference. In this paper, we introduce LP-SparseMAP, an extension of SparseMAP that addresses this limitation via a local polytope relaxation. LP-SparseMAP uses the flexible and powerful domain specific language of factor graphs for defining and backpropagating through arbitrary hidden structure, supporting coarse decompositions, hard logic constraints, and higher-order correlations. We derive the forward and backward algorithms needed for using LP-SparseMAP as a hidden or output layer. Experiments in three structured prediction tasks show benefits compared to SparseMAP and Structured SVM.