AITopics | softmax cross-entropy

Collaborating Authors

softmax cross-entropy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hyperspherical Prototype Networks

Pascal Mettes, Elise van der Pol, Cees Snoek

Neural Information Processing SystemsApr-30-2026, 20:24:36 GMT

This paper introduces hyperspherical prototype networks, which unify classification and regression with prototypes on hyperspherical output spaces. For classification, a common approach is to define prototypes as the mean output vector over training examples per class. Here, we propose to use hyperspheres as output spaces, with class prototypes defined a priori with large margin separation. We position prototypes through data-independent optimization, with an extension to incorporate priors from class semantics. By doing so, we do not require any prototype updating, we can handle any training size, and the output dimensionality is no longer constrained to the number of classes. Furthermore, we generalize to regression, by optimizing outputs as an interpolation between two prototypes on the hypersphere. Since both tasks are now defined by the same loss function, they can be jointly trained for multi-task problems. Experimentally, we show the benefit of hyperspherical prototype networks for classification, regression, and their combination over other prototype methods, softmax cross-entropy, and mean squared error approaches.

artificial intelligence, machine learning, prototype, (15 more...)

Neural Information Processing Systems

Country: Europe > Netherlands > North Holland > Amsterdam (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

02a32ad2669e6fe298e607fe7cc0e1a0-AuthorFeedback.pdf

Neural Information Processing SystemsApr-30-2026, 20:24:22 GMT

We thank all the reviewers (R1,R2,R3) for their feedback and suggestions.1 Table A: Multi-task comparison across task weights. We have per-2 formed loss balancing with five different weights t3 in the multi-task loss Lm = t Lc +(1 t) Lr for4 the classification and regression losses. The results5 on OmniArt are reported in Table A. Our proposal6 is robust to the weight value, tuning the task weight7 is not vital. We obtain a moderate gain for both clas-8 sification and regression with a weight of t = 0.25.9 For the multi-task baseline, emphasizing regression10 reduces the regression error, as the gradient magnitude of the regression loss is much lower than the one for the11 classification loss.

artificial intelligence, dimension, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Hyperspherical Prototype Networks

Pascal Mettes, Elise van der Pol, Cees Snoek

Neural Information Processing SystemsFeb-19-2026, 09:47:07 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New Jersey (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.96)

Add feedback

f0bf4a2da952528910047c31b6c2e951-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 20:40:50 GMT

accuracy, class separation, representation, (11 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Michigan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

bc8f76d9caadd48f77025b1c889d2e2d-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-18-2025, 08:55:25 GMT

Table 1 summarises various approaches for learning to defer.

artificial intelligence, exp, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Add feedback

f0bf4a2da952528910047c31b6c2e951-Paper.pdf

Neural Information Processing SystemsMay-30-2025, 10:56:48 GMT

Previous work has proposed many new loss functions and regularizers that improve test accuracy on image classification tasks. However, it is not clear whether these loss functions learn better representations for downstream tasks. This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet. We show that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks, and the choice of loss has little effect when networks are fully fine-tuned on the new tasks. Using centered kernel alignment to measure similarity between hidden representations of networks, we find that differences among loss functions are apparent only in the last few layers of the network. We delve deeper into representations of the penultimate layer, finding that different objectives and hyperparameter combinations lead to dramatically different levels of class separation. Representations with higher class separation obtain higher accuracy on the original task, but their features are less useful for downstream tasks. Our results suggest there exists a trade-off between learning invariant features for the original task and features relevant for transfer tasks.

accuracy, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: