Supervised Learning
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Bahmani, Zeinab, Bertossi, Leopoldo, Vasiloglou, Nikolaos
Appendix A. Relational MDs and the UCI Property Here, we formally extend the class of matching dependencies (MDs) introduced in Section 2.1, which we will call classical MDs, to the larger class of relational MDs. This extension is motivated by the application of MDs to blocking for entity resolution, but applications can be easily foreseen in other areas where declarative relational knowledge may be useful in combination with matching and merging. We also identify classes of relational MDs for which a single clean instance exists, no matter how the MDs are enforced, that can be computed through the chase procedure in polynomial time in the size of the database on which the MDs are enforced. We say that the MDs (in some cases in combination with an initial instance) have the unique clean instance property (UCI property). More details can be found in [11, 6, 7]. Definition 1. the form: Given a relational schema R, a relational MD is a formula of ฯ: t
A Consistent Regularization Approach for Structured Prediction
Ciliberto, Carlo, Rosasco, Lorenzo, Rudi, Alessandro
We propose and analyze a regularization approach for structured prediction problems. We characterize a large class of loss functions that allows to naturally embed structured outputs in a linear space. We exploit this fact to design learning algorithms using a surrogate loss approach and regularization techniques. We prove universal consistency and finite sample bounds characterizing the generalization properties of the proposed method. Experimental results are provided to demonstrate the practical usefulness of the proposed approach.
Linear Relaxations for Finding Diverse Elements in Metric Spaces
Bhaskara, Aditya, Ghadiri, Mehrdad, Mirrokni, Vahab, Svensson, Ola
Choosing a diverse subset of a large collection of points in a metric space is a fundamental problem, with applications in feature selection, recommender systems, web search, data summarization, etc. Various notions of diversity have been proposed, tailored to different applications. The general algorithmic goal is to find a subset of points that maximize diversity, while obeying a cardinality (or more generally, matroid) constraint. The goal of this paper is to develop a novel linear programming (LP) framework that allows us to design approximation algorithms for such problems. We study an objective known as {\em sum-min} diversity, which is known to be effective in many applications, and give the first constant factor approximation algorithm. Our LP framework allows us to easily incorporate additional constraints, as well as secondary objectives. We also prove a hardness result for two natural diversity objectives, under the so-called {\em planted clique} assumption. Finally, we study the empirical performance of our algorithm on several standard datasets. We first study the approximation quality of the algorithm by comparing with the LP objective. Then, we compare the quality of the solutions produced by our method with other popular diversity maximization algorithms.
Discriminative Gaifman Models
We present discriminative Gaifman models, a novel family of relational machine learning models. Gaifman models learn feature representations bottom up from representations of locally connected and bounded-size regions of knowledge bases (KBs). Considering local and bounded-size neighborhoods of knowledge bases renders logical inference and learning tractable, mitigates the problem of overfitting, and facilitates weight sharing. Gaifman models sample neighborhoods of knowledge bases so as to make the learned relational models more robust to missing objects and relations which is a common situation in open-world KBs. We present the core ideas of Gaifman models and apply them to large-scale relational learning problems. We also discuss the ways in which Gaifman models relate to some existing relational machine learning approaches.
Improved Error Bounds for Tree Representations of Metric Spaces
Chowdhury, Samir, Mรฉmoli, Facundo, Smith, Zane T.
Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortion bounds depending on cardinality. Because cardinality is a simple property of any set, we argue that such bounds do not fully capture the rich structure endowed by the metric. We consider an embedding of a metric space into a tree proposed by Gromov. By proving a stability result, we obtain an improved additive distortion bound depending only on the hyperbolicity and doubling dimension of the metric. We observe that Gromov's method is dual to the well-known single linkage hierarchical clustering (SLHC) method. By means of this duality, we are able to transport our results to the setting of SLHC, where such additive distortion bounds were previously unknown.
Structured Prediction Theory Based on Factor Graph Complexity
Cortes, Corinna, Kuznetsov, Vitaly, Mohri, Mehryar, Yang, Scott
We present a general theoretical analysis of structured prediction with a series of new results. We give new data-dependent margin guarantees for structured prediction for a very wide family of loss functions and a general family of hypotheses, with an arbitrary factor graph decomposition. These are the tightest margin bounds known for both standard multi-class and general structured prediction problems. Our guarantees are expressed in terms of a data-dependent complexity measure, \emph{factor graph complexity}, which we show can be estimated from data and bounded in terms of familiar quantities for several commonly used hypothesis sets, and a sparsity measure for features and graphs. Our proof techniques include generalizations of Talagrand's contraction lemma that can be of independent interest. We further extend our theory by leveraging the principle of Voted Risk Minimization (VRM) and show that learning is possible even with complex factor graphs. We present new learning bounds for this advanced setting, which we use to devise two new algorithms, \emph{Voted Conditional Random Field} (VCRF) and \emph{Voted Structured Boosting} (StructBoost). These algorithms can make use of complex features and factor graphs and yet benefit from favorable learning guarantees. We also report the results of experiments with VCRF on several datasets to validate our theory.
Improved Deep Metric Learning with Multi-class N-pair Loss Objective
Deep metric learning has gained much popularity in recent years, following the success of deep learning. However, existing frameworks of deep metric learning based on contrastive loss and triplet loss often suffer from slow convergence, partially because they employ only one negative example while not interacting with the other negative classes in each update. In this paper, we propose to address this problem with a new metric learning objective called multiclassN -pair loss . The proposed objective function firstly generalizes triplet loss by allowing joint comparison among more than one negative examples - more specifically,N -1 negative examples - and secondly reduces the computational burden of evaluating deep embedding vectors via an efficient batch construction strategy using onlyN pairs of examples, instead of (N 1) N . We demonstrate the superiority of our proposed loss to the triplet loss as well as other competing loss functions for a variety of tasks on several visual recognition benchmark, including fine-grained object recognition and verification, image clustering and retrieval, and face verification and identification.
Stochastic Structured Prediction under Bandit Feedback
Sokolov, Artem, Kreutzer, Julia, Riezler, Stefan, Lo, Christopher
Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analyze them as stochastic first-order methods. We present an experimental evaluation on problems of natural language processing over exponential output spaces, and compare convergence speed across different objectives under the practical criterion of optimal task performance on development data and the optimization-theoretic criterion of minimal squared gradient norm. Best results under both criteria are obtained for a non-convex objective for pairwise preference learning under bandit feedback.
Active Nearest-Neighbor Learning in Metric Spaces
Kontorovich, Aryeh, Sabato, Sivan, Urner, Ruth
We propose a pool-based non-parametric active learning algorithm for general metric spaces, called MArgin Regularized Metric Active Nearest Neighbor (MARMANN), which outputs a nearest-neighbor classifier. We give prediction error guarantees that depend on the noisy-margin properties of the input sample, and are competitive with those obtained by previously proposed passive learners. We prove that the label complexity of MARMANN is significantly lower than that of any passive learner with similar error guarantees. Our algorithm is based on a generalized sample compression scheme and a new label-efficient active model-selection procedure.
Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games
Balandat, Maximilian, Krichene, Walid, Tomlin, Claire, Bayen, Alexandre
We study a general adversarial online learning problem, in which we are given a decision set X' in a reflexive Banach space X and a sequence of reward vectors in the dual space of X. At each iteration, we choose an action from X', based on the observed sequence of previous rewards. Our goal is to minimize regret, defined as the gap between the realized reward and the reward of the best fixed action in hindsight. Using results from infinite dimensional convex analysis, we generalize the method of Dual Averaging (or Follow the Regularized Leader) to our setting and obtain upper bounds on the worst-case regret that generalize many previous results. Under the assumption of uniformly continuous rewards, we obtain explicit regret bounds in a setting where the decision set is the set of probability distributions on a compact metric space S. Importantly, we make no convexity assumptions on either the set S or the reward functions. We also prove a general lower bound on the worst-case regret for any online algorithm. We then apply these results to the problem of learning in repeated two-player zero-sum games on compact metric spaces. In doing so, we first prove that if both players play a Hannan-consistent strategy, then with probability 1 the empirical distributions of play weakly converge to the set of Nash equilibria of the game. We then show that, under mild assumptions, Dual Averaging on the (infinite-dimensional) space of probability distributions indeed achieves Hannan-consistency.