Supervised Learning
Stochastic Structured Prediction under Bandit Feedback
Sokolov, Artem, Kreutzer, Julia, Riezler, Stefan, Lo, Christopher
Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analyze them as stochastic first-order methods. We present an experimental evaluation on problems of natural language processing over exponential output spaces, and compare convergence speed across different objectives under the practical criterion of optimal task performance on development data and the optimization-theoretic criterion of minimal squared gradient norm. Best results under both criteria are obtained for a non-convex objective for pairwise preference learning under bandit feedback.
Linear Relaxations for Finding Diverse Elements in Metric Spaces
Bhaskara, Aditya, Ghadiri, Mehrdad, Mirrokni, Vahab, Svensson, Ola
Choosing a diverse subset of a large collection of points in a metric space is a fundamental problem, with applications in feature selection, recommender systems, web search, data summarization, etc. Various notions of diversity have been proposed, tailored to different applications. The general algorithmic goal is to find a subset of points that maximize diversity, while obeying a cardinality (or more generally, matroid) constraint. The goal of this paper is to develop a novel linear programming (LP) framework that allows us to design approximation algorithms for such problems. We study an objective known as {\em sum-min} diversity, which is known to be effective in many applications, and give the first constant factor approximation algorithm. Our LP framework allows us to easily incorporate additional constraints, as well as secondary objectives. We also prove a hardness result for two natural diversity objectives, under the so-called {\em planted clique} assumption. Finally, we study the empirical performance of our algorithm on several standard datasets. We first study the approximation quality of the algorithm by comparing with the LP objective. Then, we compare the quality of the solutions produced by our method with other popular diversity maximization algorithms.
Active Nearest-Neighbor Learning in Metric Spaces
Kontorovich, Aryeh, Sabato, Sivan, Urner, Ruth
We propose a pool-based non-parametric active learning algorithm for general metric spaces, called MArgin Regularized Metric Active Nearest Neighbor (MARMANN), which outputs a nearest-neighbor classifier. We give prediction error guarantees that depend on the noisy-margin properties of the input sample, and are competitive with those obtained by previously proposed passive learners. We prove that the label complexity of MARMANN is significantly lower than that of any passive learner with similar error guarantees. Our algorithm is based on a generalized sample compression scheme and a new label-efficient active model-selection procedure.
Improved Deep Metric Learning with Multi-class N-pair Loss Objective
Deep metric learning has gained much popularity in recent years, following the success of deep learning. However, existing frameworks of deep metric learning based on contrastive loss and triplet loss often suffer from slow convergence, partially because they employ only one negative example while not interacting with the other negative classes in each update. In this paper, we propose to address this problem with a new metric learning objective called multiclassN -pair loss . The proposed objective function firstly generalizes triplet loss by allowing joint comparison among more than one negative examples - more specifically,N -1 negative examples - and secondly reduces the computational burden of evaluating deep embedding vectors via an efficient batch construction strategy using onlyN pairs of examples, instead of (N 1) N . We demonstrate the superiority of our proposed loss to the triplet loss as well as other competing loss functions for a variety of tasks on several visual recognition benchmark, including fine-grained object recognition and verification, image clustering and retrieval, and face verification and identification.
Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games
Balandat, Maximilian, Krichene, Walid, Tomlin, Claire, Bayen, Alexandre
We study a general adversarial online learning problem, in which we are given a decision set X' in a reflexive Banach space X and a sequence of reward vectors in the dual space of X. At each iteration, we choose an action from X', based on the observed sequence of previous rewards. Our goal is to minimize regret, defined as the gap between the realized reward and the reward of the best fixed action in hindsight. Using results from infinite dimensional convex analysis, we generalize the method of Dual Averaging (or Follow the Regularized Leader) to our setting and obtain upper bounds on the worst-case regret that generalize many previous results. Under the assumption of uniformly continuous rewards, we obtain explicit regret bounds in a setting where the decision set is the set of probability distributions on a compact metric space S. Importantly, we make no convexity assumptions on either the set S or the reward functions. We also prove a general lower bound on the worst-case regret for any online algorithm. We then apply these results to the problem of learning in repeated two-player zero-sum games on compact metric spaces. In doing so, we first prove that if both players play a Hannan-consistent strategy, then with probability 1 the empirical distributions of play weakly converge to the set of Nash equilibria of the game. We then show that, under mild assumptions, Dual Averaging on the (infinite-dimensional) space of probability distributions indeed achieves Hannan-consistency.
Improved Error Bounds for Tree Representations of Metric Spaces
Chowdhury, Samir, Mémoli, Facundo, Smith, Zane T.
Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortion bounds depending on cardinality. Because cardinality is a simple property of any set, we argue that such bounds do not fully capture the rich structure endowed by the metric. We consider an embedding of a metric space into a tree proposed by Gromov. By proving a stability result, we obtain an improved additive distortion bound depending only on the hyperbolicity and doubling dimension of the metric. We observe that Gromov's method is dual to the well-known single linkage hierarchical clustering (SLHC) method. By means of this duality, we are able to transport our results to the setting of SLHC, where such additive distortion bounds were previously unknown.
Structured Prediction Theory Based on Factor Graph Complexity
Cortes, Corinna, Kuznetsov, Vitaly, Mohri, Mehryar, Yang, Scott
We present a general theoretical analysis of structured prediction with a series of new results. We give new data-dependent margin guarantees for structured prediction for a very wide family of loss functions and a general family of hypotheses, with an arbitrary factor graph decomposition. These are the tightest margin bounds known for both standard multi-class and general structured prediction problems. Our guarantees are expressed in terms of a data-dependent complexity measure, \emph{factor graph complexity}, which we show can be estimated from data and bounded in terms of familiar quantities for several commonly used hypothesis sets, and a sparsity measure for features and graphs. Our proof techniques include generalizations of Talagrand's contraction lemma that can be of independent interest. We further extend our theory by leveraging the principle of Voted Risk Minimization (VRM) and show that learning is possible even with complex factor graphs. We present new learning bounds for this advanced setting, which we use to devise two new algorithms, \emph{Voted Conditional Random Field} (VCRF) and \emph{Voted Structured Boosting} (StructBoost). These algorithms can make use of complex features and factor graphs and yet benefit from favorable learning guarantees. We also report the results of experiments with VCRF on several datasets to validate our theory.
A Consistent Regularization Approach for Structured Prediction
Ciliberto, Carlo, Rosasco, Lorenzo, Rudi, Alessandro
We propose and analyze a regularization approach for structured prediction problems. We characterize a large class of loss functions that allows to naturally embed structured outputs in a linear space. We exploit this fact to design learning algorithms using a surrogate loss approach and regularization techniques. We prove universal consistency and finite sample bounds characterizing the generalization properties of the proposed method. Experimental results are provided to demonstrate the practical usefulness of the proposed approach.
A real quick snooze! New record set for the world's fastest BED - as modified vehicle clocks 84mph on the race track
New record set for the world's fastest BED with motorised mattress clocking 84mph on a race track Engineers were commissioned by a hotel booking site to build a motorised bed British racing diver Tom Onslow-Cole, 29, took the piece of furniture for a spin He broke the Guinness World Record for the World's Fastest Bed at 83.8mph He broke the Guinness World Record for the World's Fastest Bed at 83.8mph British racing diver Tom Onslow-Cole, 29, took the piece of furniture for a spin and broke the Guinness World Record for the World's Fastest Bed, clocking 84mph The do's and don'ts of après-ski revealed (including why... Aviation expert reveals how to travel in luxury on a... The do's and don'ts of après-ski revealed (including why... Aviation expert reveals how to travel in luxury on a... Crossing the finish line: Adjudicators clocked it whooshing forwards at 83.8 mph A wheely great sleep: Onslow-Cole said his speedy snooze was an'unforgettable experience'. He added: 'I hope it'll stand the test of time – it'll take some beating!' Woman goes on racist rant while waiting in line at J.C. Penney Black blues musician explores racism by befriending the KKK A young thug is filmed fly kicking a lady in the back Dramatic moment man removed from flight for'speaking Arabic' GRAPHIC: Robber is left writhing on the pavement after shot out Syrian police injured after girl blows herself up inside station Male guests in a Chinese wedding flock to harass a bridesmaid Angela Rye shares video of her invasive ordeal with TSA agent Body cam footage shows moments before two Georgia cops are shot Boeing cargo plane overshoots runway before crashing in Colombia Shocking video shows a Texas mother hitting her daughter Adorable moment puppy excitedly unwraps Christmas present Woman goes on racist rant while waiting in line at J.C. Penney Dramatic moment man removed from flight for'speaking Arabic' Is resting your head on a BOX the best way to sleep on a... Shocking pictures reveal how some of the most picturesque... Choose the right seat, alter your watch and drink alcohol:... Fascinating images capture the... Should you be worried about flying in the snow? When photographers were asked to submit their best holiday... 'Is this a real picture?
Structured Prediction Theory Based on Factor Graph Complexity
Cortes, Corinna, Mohri, Mehryar, Kuznetsov, Vitaly, Yang, Scott
We present a general theoretical analysis of structured prediction with a series of new results. We give new data-dependent margin guarantees for structured prediction for a very wide family of loss functions and a general family of hypotheses, with an arbitrary factor graph decomposition. These are the tightest margin bounds known for both standard multi-class and general structured prediction problems. Our guarantees are expressed in terms of a data-dependent complexity measure, factor graph complexity, which we show can be estimated from data and bounded in terms of familiar quantities. We further extend our theory by leveraging the principle of Voted Risk Minimization (VRM) and show that learning is possible even with complex factor graphs. We present new learning bounds for this advanced setting, which we use to design two new algorithms, Voted Conditional Random Field (VCRF) and Voted Structured Boosting (StructBoost). These algorithms can make use of complex features and factor graphs and yet benefit from favorable learning guarantees. We also report the results of experiments with VCRF on several datasets to validate our theory.