AITopics | temperature scaling

Collaborating Authors

temperature scaling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Accurate and Calibrated Classification: Regularizing Cross-Entropy From A Generative Perspective

Zhan, Qipeng, Zhou, Zhuoping, Shen, Li

arXiv.org Machine LearningApr-9-2026

Accurate classification requires not only high predictive accuracy but also well-calibrated confidence estimates. Yet, modern deep neural networks (DNNs) are often overconfident, primarily due to overfitting on the negative log-likelihood (NLL). While focal loss variants alleviate this issue, they typically reduce accuracy, revealing a persistent trade-off between calibration and predictive performance. Motivated by the complementary strengths of generative and discriminative classifiers, we propose Generative Cross-Entropy (GCE), which maximizes $p(x|y)$ and is equivalent to cross-entropy augmented with a class-level confidence regularizer. Under mild conditions, GCE is strictly proper. Across CIFAR-10/100, Tiny-ImageNet, and a medical imaging benchmark, GCE improves both accuracy and calibration over cross-entropy, especially in the long-tailed scenario. Combined with adaptive piecewise temperature scaling (ATS), GCE attains calibration competitive with focal-loss variants without sacrificing accuracy.

artificial intelligence, calibration, machine learning, (16 more...)

arXiv.org Machine Learning

2604.06689

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

fcc22e5b7d5d2155d994da22d045f0a6-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 19:29:51 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

f8905bd3df64ace64a68e154ba72f24c-Supplemental.pdf

Neural Information Processing SystemsFeb-12-2026, 00:07:07 GMT

dataset, secondary loss, soft calibration objective, (13 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

a70dc40477bc2adceef4d2c90f47eb82-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 12:17:01 GMT

dataset, neural network, prediction, (15 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Edge-aware baselines for ogbn-proteins in PyTorch Geometric: species-wise normalization, post-hoc calibration, and cost-accuracy trade-offs

Stanković, Aleksandar, Lisica, Dejan

arXiv.org Artificial IntelligenceNov-18-2025

We present reproducible, edge-aware baselines for ogbn-proteins in PyTorch Geometric (PyG). We study two system choices that dominate practice: (i) how 8-dimensional edge evidence is aggregated into node inputs, and (ii) how edges are used inside message passing. Our strongest baseline is GraphSAGE with sum-based edge-to-node features. We compare LayerNorm (LN), BatchNorm (BN), and a species-aware Conditional LayerNorm (CLN), and report compute cost (time, VRAM, parameters) together with accuracy (ROC-AUC) and decision quality. In our primary experimental setup (hidden size 512, 3 layers, 3 seeds), sum consistently beats mean and max; BN attains the best AUC, while CLN matches the AUC frontier with better thresholded F1. Finally, post-hoc per-label temperature scaling plus per-label thresholds substantially improves micro-F1 and expected calibration error (ECE) with negligible AUC change, and light label-correlation smoothing yields small additional gains. We release standardized artifacts and scripts used for all of the runs presented in the paper.

artificial intelligence, machine learning, threshold, (18 more...)

arXiv.org Artificial Intelligence

2511.1325

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models

Pinneri, Cristina, Louizos, Christos

arXiv.org Artificial IntelligenceNov-17-2025

Guard models are a critical component of LLM safety, but their sensitivity to superficial linguistic variations remains a key vulnerability. We show that even meaning-preserving paraphrases can cause large fluctuations in safety scores, revealing a lack of semantic grounding. To address this, we introduce a practical, self-supervised framework for improving the semantic robustness of guard models. Our method leverages paraphrase sets to enforce prediction consistency using a novel, skew-aware aggregation strategy for robust target computation. Notably, we find that standard aggregation methods like mean and median can degrade safety, underscoring the need for skew-aware alternatives. We analyze six open-source guard models and show that our approach reduces semantic variability across paraphrases by ~58%, improves benchmark accuracy by ~2.5% on average, and generalizes to unseen stylistic variations. Intriguingly, we discover a bidirectional relationship between model calibration and consistency: our robustness training improves calibration by up to 40%, revealing a fundamental connection between these properties. These results highlight the value of treating semantic consistency as a first-class training objective and provide a scalable recipe for building more reliable guard models.

calibration, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.10665

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Appendix

Neural Information Processing SystemsNov-14-2025, 20:10:39 GMT

The Appendix is structured as follows: A Models and Datasets 16 Details and references for the models and datasets used in this work. Table 1 provides an overview of the models used in this study. Table 1: Overview of models used in this study. A.2 Datasets We evaluate accuracy and calibration the following benchmark datasets: 1. V2 (Recht et al., 2019) is a new I The dataset contains 10 000 images. 3. In addition, the following datasets are used for pretraining as described in the text: 1.

calibration, classification error, variant, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Diverse Preference Learning for Capabilities and Alignment

Slocum, Stewart, Parker-Sartori, Asher, Hadfield-Menell, Dylan

arXiv.org Artificial IntelligenceNov-13-2025

The ability of LLMs to represent diverse perspectives is critical as they increasingly impact society. However, recent studies reveal that alignment algorithms such as RLHF and DPO significantly reduce the diversity of LLM outputs. Not only do aligned LLMs generate text with repetitive structure and word choice, they also approach problems in more uniform ways, and their responses reflect a narrower range of societal perspectives. We attribute this problem to the KL divergence regularizer employed in preference learning algorithms. This causes the model to systematically overweight majority opinions and sacrifice diversity in its outputs. To address this, we propose Soft Preference Learning, which decouples the entropy and cross-entropy terms in the KL penalty -- allowing for fine-grained control over LLM generation diversity. From a capabilities perspective, LLMs trained using Soft Preference Learning attain higher accuracy on difficult repeated sampling tasks and produce outputs with greater semantic and lexical diversity. From an alignment perspective, they are capable of representing a wider range of societal viewpoints and display improved logit calibration. Notably, Soft Preference Learning resembles, but is a Pareto improvement over, standard temperature scaling. As LLMs become integrated into how people consume information (Bick et al., 2024) and approach tasks (Deloitte, 2024), their ability to represent diverse perspectives is critical. For example, consider an LLM answering the following multiple-choice question: The best way to reduce income inequality is: (A) Increase minimum wage (B) Expand access to education and job training (C) Implement universal basic income (D) Lower taxes on the wealthy to stimulate job creation Imagine a survey showing people's preferences as: A (55%), B (20%), C (15%), and D (10%). How should an LLM respond to this question? Ideally, we may prefer it to reflect the range of views in the population. If an LLM assigns 99% probability to majority option A, it fails to represent the diversity of perspectives. With LLMs becoming important information sources, this may reinforce dominant narratives at the expense of minority views. However, recent studies show that alignment algorithms such as RLHF and DPO significantly reduce the diversity of LLM outputs. This leads to mode collapse towards majority preferences, as the example above shows (Kirk et al., 2024; Padmakumar & He, 2024; Rafailov et al., 2024; Christiano et al., 2023). In a generative setting, this results in repetitive responses, as illustrated in Figure 1. For example, the DPO model frequently uses the same doctor's name and 1 We highlight Doctor name, gender, and textual aberration features shown in the plots on the right. DPO responses are well-formed but lack diversity (e.g.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.08594

Genre: Research Report > New Finding (0.48)

Industry:

Government (0.66)
Education (0.54)
Banking & Finance > Economy (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Confidence Calibration in Large Language Model-Based Entity Matching

Kamsteeg, Iris, Cardenas-Cartagena, Juan, van Beers, Floris, Holt, Gineke ten, Tashu, Tsegaye Misikir, Valdenegro-Toro, Matias

arXiv.org Artificial IntelligenceOct-17-2025

This research aims to explore the intersection of Large Language Models and confidence calibration in Entity Matching. To this end, we perform an empirical study to compare baseline RoBERTa confidences for an Entity Matching task against confidences that are calibrated using Temperature Scaling, Monte Carlo Dropout and Ensembles. We use the Abt-Buy, DBLP-ACM, iTunes-Amazon and Company datasets. The findings indicate that the proposed modified RoBERTa model exhibits a slight overconfidence, with Expected Calibration Error scores ranging from 0.0043 to 0.0552 across datasets. We find that this overconfidence can be mitigated using Temperature Scaling, reducing Expected Calibration Error scores by up to 23.83%.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.19557

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology: