AITopics | relint

The statistical consistency of surrogate losses for discrete prediction tasks is often checked via the condition of calibration. However, directly verifying calibration can be arduous. Recent work shows that for polyhedral surrogates, a less arduous condition, indirect elicitation (IE), is still equivalent to calibration. We give the first results of this type for non-polyhedral surrogates, specifically the class of convex differentiable losses. We first prove that under mild conditions, IE and calibration are equivalent for one-dimensional losses in this class. We construct a counter-example that shows that this equivalence fails in higher dimensions. This motivates the introduction of strong IE, a strengthened form of IE that is equally easy to verify. We establish that strong IE implies calibration for differentiable surrogates and is both necessary and sufficient for strongly convex, differentiable surrogates. Finally, we apply these results to a range of problems to demonstrate the power of IE and strong IE for designing and analyzing consistent differentiable surrogates.

artificial intelligence, calibration, machine learning, (17 more...)

arXiv.org Machine Learning

2505.1376

Country:

North America > United States > Colorado > Boulder County > Boulder (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the connection between Bregman divergence and value in regularized Markov decision processes

O'Donoghue, Brendan

arXiv.org Artificial IntelligenceNov-6-2022

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others. The main result of this manuscript holds more generally, but for brevity we shall restrict ourselves to this case. To prove our main result we require a slight generalization of the performance difference lemma (PDL)[1] to cover the regularized MDP case. The proof of this identity is included in the appendix for completeness.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2210.1216

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Weston-Watkins Hinge Loss and Ordered Partitions

Wang, Yutong, Scott, Clayton D.

arXiv.org Machine LearningJun-12-2020

Multiclass extensions of the support vector machine (SVM) have been formulated in a variety of ways. A recent empirical comparison of nine such formulations [Do\v{g}an et al. 2016] recommends the variant proposed by Weston and Watkins (WW), despite the fact that the WW-hinge loss is not calibrated with respect to the 0-1 loss. In this work we introduce a novel discrete loss function for multiclass classification, the ordered partition loss, and prove that the WW-hinge loss is calibrated with respect to this loss. We also argue that the ordered partition loss is maximally informative among discrete losses satisfying this property. Finally, we apply our theory to justify the empirical observation made by Do\v{g}an et al. that the WW-SVM can work well even under massive label noise, a challenging setting for multiclass SVMs.

artificial intelligence, machine learning, theorem 3, (16 more...)

arXiv.org Machine Learning

2006.07346

Country:

North America > United States > Michigan (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

ReLU Code Space: A Basis for Rating Network Quality Besides Accuracy

Shepeleva, Natalia, Zellinger, Werner, Lewandowski, Michal, Moser, Bernhard

arXiv.org Machine LearningMay-20-2020

We propose a new metric space of ReLU activation codes equipped with a truncated Hamming distance which establishes an isometry between its elements and polyhedral bodies in the input space which have recently been shown to be strongly related to safety, robustness, and confidence. This isometry allows the efficient computation of adjacency relations between the polyhedral bodies. Experiments on MNIST and CIFAR-10 indicate that information besides accuracy might be stored in the code space.

artificial intelligence, experiment, machine learning, (18 more...)

arXiv.org Machine Learning

2005.09903

Country:

Asia > Middle East > Jordan (0.05)
Europe > Austria > Upper Austria (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Frank-Wolfe Splitting via Augmented Lagrangian Method

Gidel, Gauthier, Pedregosa, Fabian, Lacoste-Julien, Simon

arXiv.org Machine LearningApr-9-2018

Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than minimizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets, and only require access to the oracle on the individual constraints. In this work, we develop and analyze the Frank-Wolfe Augmented Lagrangian (FW-AL) algorithm, a method for minimizing a smooth function over convex compact sets related by a "linear consistency" constraint that only requires access to a linear minimization oracle over the individual constraints. It is based on the Augmented Lagrangian Method (ALM), also known as Method of Multipliers, but unlike most existing splitting methods, it only requires access to linear (instead of quadratic) minimization oracles. We use recent advances in the analysis of Frank-Wolfe and the alternating direction method of multipliers algorithms to prove a sublinear convergence rate for FW-AL over general convex compact sets and a linear convergence rate over polytopes.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1804.03176

Country:

Europe (0.45)
North America (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Finite Local Consistency Characterizes Generalized Scoring Rules

Xia, Lirong (Duke University) | Conitzer, Vincent (Duke University)

AAAI ConferencesJun-23-2009

An important problem in computational social choice concerns whether it is possible to prevent manipulation of voting rules by making it computationally intractable. To answer this, a key question is how frequently voting rules are manipulable. We [Xia and Conitzer, 2008] recently defined the class of generalized scoring rules (GSRs) and characterized the frequency of manipulability for such rules. We showed, by examples, that most common rules seem to fall into this class. However, no natural axiomatic characterization of the class was given, leaving the possibility that there are natural rules to which these results do not apply. In this paper, we characterize the class of GSRs based on two natural properties: it is equal to the class of rules that are anonymous and finitely locally consistent. Generalized scoring rules also have other uses in computational social choice. For these uses, the order of the GSR (the dimension of its score vector) is important. Our characterization result implies that the order of a GSR is related to the minimum number of locally consistent components of the rule. We proceed to bound the minimum number of locally consistent components for some common rules.

consistency, vector, voting rule, (17 more...)

AAAI Conferences

Twenty-First International Joint Conference on Artificial Intelligence

Country: North America > United States > North Carolina > Durham County > Durham (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Games (0.83)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

On the Geometry of Discrete Exponential Families with Application to Exponential Random Graph Models

Fienberg, Stephen E., Rinaldo, Alessandro, Zhou, Yi

arXiv.org Machine LearningDec-30-2008

There has been an explosion of interest in statistical models for analyzing network data, and considerable interest in the class of exponential random graph (ERG) models, especially in connection with difficulties in computing maximum likelihood estimates. The issues associated with these difficulties relate to the broader structure of discrete exponential families. This paper re-examines the issues in two parts. First we consider the closure of $k$-dimensional exponential families of distribution with discrete base measure and polyhedral convex support $\mathrm{P}$. We show that the normal fan of $\mathrm{P}$ is a geometric object that plays a fundamental role in deriving the statistical and geometric properties of the corresponding extended exponential families. We discuss its relevance to maximum likelihood estimation, both from a theoretical and computational standpoint. Second, we apply our results to the analysis of ERG models. In particular, by means of a detailed example, we provide some characterization of the properties of ERG models, and, in particular, of certain behaviors of ERG models known as degeneracy.

artificial intelligence, erg model, machine learning, (19 more...)

arXiv.org Machine Learning

0901.0026

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (1.00)

Industry: