AITopics | generalized gradient

Collaborating Authors

generalized gradient

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How to Backpropagate through Hungarian in Your DETR?

Chen, Lingji, Sharma, Alok, Shirore, Chinmay, Zhang, Chengjie, Buddharaju, Balarama Raju

arXiv.org Artificial IntelligenceNov-11-2022

The DEtection TRansformer (DETR) approach, which uses a transformer encoder-decoder architecture and a set-based global loss, has become a building block in many transformer based applications. However, as originally presented, the assignment cost and the global loss are not aligned, i.e., reducing the former is likely but not guaranteed to reduce the latter. And the issue of gradient is ignored when a combinatorial solver such as Hungarian is used. In this paper we show that the global loss can be expressed as the sum of an assignment-independent term, and an assignment-dependent term which can be used to define the assignment cost matrix. Recent results on generalized gradients of optimal assignment cost with respect to parameters of an assignment problem are then used to define generalized gradients of the loss with respect to network parameters, and backpropagation is carried out properly. Our experiments using the same loss weights show interesting convergence properties and a potential for further performance improvements.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2211.14448

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Combinatorial Losses through Generalized Gradients of Integer Linear Programs

Gao, Xi, Zhang, Han, Panahi, Aliakbar, Arodz, Tom

arXiv.org Machine LearningOct-17-2019

When samples have internal structure, we often see a mismatch between the objective optimized during training and the model's goal during inference. For example, in sequence-to-sequence modeling we are interested in high-quality translated sentences, but training typically uses maximum likelihood at the word level. Learning to recognize individual faces from group photos, each captioned with the correct but unordered list of people in it, is another example where a mismatch between training and inference objectives occurs. In both cases, the natural training-time loss would involve a combinatorial problem -- dynamic programming-based global sequence alignment and weighted bipartite graph matching, respectively -- but solutions to combinatorial problems are not differentiable with respect to their input parameters, so surrogate, differentiable losses are used instead. Here, we show how to perform gradient descent over combinatorial optimization algorithms that involve continuous parameters, for example edge weights, and can be efficiently expressed as integer, linear, or mixed-integer linear programs. We demonstrate usefulness of gradient descent over combinatorial optimization in sequence-to-sequence modeling using differentiable encoder-decoder architecture with softmax or Gumbel-softmax, and in weakly supervised learning involving a convolutional, residual feed-forward network for image classification.

combinatorial problem, generalized gradient, gradient, (15 more...)

arXiv.org Machine Learning

1910.08211

Country: North America > United States > Virginia (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
(2 more...)

Add feedback

Learning in Riemannian Orbifolds

Jain, Brijnesh J., Obermayer, Klaus

arXiv.org Artificial IntelligenceApr-19-2012

Statistical data analysis and learning in Riemannian orbifolds is motivated by applications, where the data we want to learn on are naturally represented by finite combinatorial structures such as point patterns, trees, and graphs. Examples from structural pattern recognition that learn on structured data include estimating central points of a distribution on graphs such as the mean and median [9, 16, 15, 21], central clustering of graphs [10, 12, 13, 14, 19, 15, 23], learning graph quantization [17], and multilayer perceptrons for graphs [20]. In retrospect, the structure space framework proposed by [18] theoretically justifies the above approaches in the sense that they actually minimize an empirical risk function on structures. Since minimizing an empirical risk function is usually computationally intractable, the ultimate challenge consists in constructing efficient algorithms which are capable to return optimal or at least suboptimal solutions. From the point of view of statistical pattern recognition, however, the ultimate goal is not to determine a good solution of an empirical risk function, but rather to discover the true but unknown structure of the data with respect to its distribution.

artificial intelligence, machine learning, pattern recognition, (14 more...)

arXiv.org Artificial Intelligence

1204.4294

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback