Goto

Collaborating Authors

 relint


Supplementary material: Weston-Watkins Hinge Loss and Ordered Partitions

Neural Information Processing Systems

References to contents in the supplementary material are prefixed with an "S", e.g., Lemma S3.20, References to contents in the main article do not have any prefix, e.g., Theorem 1.3 and Section 1.4. We introduce notations in addition to those already defined in in the main article's Section 1.3. The exception is the last section Section S6.2, where the explicit All vectors are column vectors unless stated otherwise. If j = 1, then the result is trivial. Now, for the first case, suppose that y null { 1,j }.





Consistency Conditions for Differentiable Surrogate Losses

Khurana, Drona, Thilagar, Anish, Kimpara, Dhamma, Frongillo, Rafael

arXiv.org Machine Learning

The statistical consistency of surrogate losses for discrete prediction tasks is often checked via the condition of calibration. However, directly verifying calibration can be arduous. Recent work shows that for polyhedral surrogates, a less arduous condition, indirect elicitation (IE), is still equivalent to calibration. We give the first results of this type for non-polyhedral surrogates, specifically the class of convex differentiable losses. We first prove that under mild conditions, IE and calibration are equivalent for one-dimensional losses in this class. We construct a counter-example that shows that this equivalence fails in higher dimensions. This motivates the introduction of strong IE, a strengthened form of IE that is equally easy to verify. We establish that strong IE implies calibration for differentiable surrogates and is both necessary and sufficient for strongly convex, differentiable surrogates. Finally, we apply these results to a range of problems to demonstrate the power of IE and strong IE for designing and analyzing consistent differentiable surrogates.


On the connection between Bregman divergence and value in regularized Markov decision processes

O'Donoghue, Brendan

arXiv.org Artificial Intelligence

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others. The main result of this manuscript holds more generally, but for brevity we shall restrict ourselves to this case. To prove our main result we require a slight generalization of the performance difference lemma (PDL)[1] to cover the regularized MDP case. The proof of this identity is included in the appendix for completeness.


Weston-Watkins Hinge Loss and Ordered Partitions

Wang, Yutong, Scott, Clayton D.

arXiv.org Machine Learning

Multiclass extensions of the support vector machine (SVM) have been formulated in a variety of ways. A recent empirical comparison of nine such formulations [Do\v{g}an et al. 2016] recommends the variant proposed by Weston and Watkins (WW), despite the fact that the WW-hinge loss is not calibrated with respect to the 0-1 loss. In this work we introduce a novel discrete loss function for multiclass classification, the ordered partition loss, and prove that the WW-hinge loss is calibrated with respect to this loss. We also argue that the ordered partition loss is maximally informative among discrete losses satisfying this property. Finally, we apply our theory to justify the empirical observation made by Do\v{g}an et al. that the WW-SVM can work well even under massive label noise, a challenging setting for multiclass SVMs.


ReLU Code Space: A Basis for Rating Network Quality Besides Accuracy

Shepeleva, Natalia, Zellinger, Werner, Lewandowski, Michal, Moser, Bernhard

arXiv.org Machine Learning

We propose a new metric space of ReLU activation codes equipped with a truncated Hamming distance which establishes an isometry between its elements and polyhedral bodies in the input space which have recently been shown to be strongly related to safety, robustness, and confidence. This isometry allows the efficient computation of adjacency relations between the polyhedral bodies. Experiments on MNIST and CIFAR-10 indicate that information besides accuracy might be stored in the code space.


Frank-Wolfe Splitting via Augmented Lagrangian Method

Gidel, Gauthier, Pedregosa, Fabian, Lacoste-Julien, Simon

arXiv.org Machine Learning

Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than minimizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets, and only require access to the oracle on the individual constraints. In this work, we develop and analyze the Frank-Wolfe Augmented Lagrangian (FW-AL) algorithm, a method for minimizing a smooth function over convex compact sets related by a "linear consistency" constraint that only requires access to a linear minimization oracle over the individual constraints. It is based on the Augmented Lagrangian Method (ALM), also known as Method of Multipliers, but unlike most existing splitting methods, it only requires access to linear (instead of quadratic) minimization oracles. We use recent advances in the analysis of Frank-Wolfe and the alternating direction method of multipliers algorithms to prove a sublinear convergence rate for FW-AL over general convex compact sets and a linear convergence rate over polytopes.


Finite Local Consistency Characterizes Generalized Scoring Rules

Xia, Lirong (Duke University) | Conitzer, Vincent (Duke University)

AAAI Conferences

An important problem in computational social choice concerns whether it is possible to prevent manipulation of voting rules by making it computationally intractable. To answer this, a key question is how frequently voting rules are manipulable. We [Xia and Conitzer, 2008] recently defined the class of generalized scoring rules (GSRs) and characterized the frequency of manipulability for such rules. We showed, by examples, that most common rules seem to fall into this class. However, no natural axiomatic characterization of the class was given, leaving the possibility that there are natural rules to which these results do not apply. In this paper, we characterize the class of GSRs based on two natural properties: it is equal to the class of rules that are anonymous and finitely locally consistent. Generalized scoring rules also have other uses in computational social choice. For these uses, the order of the GSR (the dimension of its score vector) is important. Our characterization result implies that the order of a GSR is related to the minimum number of locally consistent components of the rule. We proceed to bound the minimum number of locally consistent components for some common rules.