AITopics

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Neural Information Processing SystemsFeb-11-2026, 13:35:41 GMT

e0308d73972d8dd5e2dd27853106386e-Paper.pdf

Although deep learning programs havedemonstrated strong performance on novel applications, they sacrifice many of the functionalities of traditional software programs.

artificial intelligence, deep learning, machine learning, (16 more...)

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Artificial IntelligenceNov-3-2025

Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring

Jiao, Hong, Choi, Hanna, Hua, Haowei

Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring Hong Jiao University of Maryland, College Park Hanna Choi University of Maryland, College Park Haowei Hua Princeton University Abstract This study explored the utilities of rationales generated by GPT-4.1 and GPT -5 in automated scoring using Prompt 6 essays from the 2012 Kaggle ASAP data . Essay-based scoring was compared with rationale-based scoring. The study found in general essay -based scoring performed better than rationale -based scoring with higher Quadratic Weighted Kappa (QWK). However, rationale-based scoring led to higher scoring accuracy in terms of F1 scores for score 0 which had less representation due to class imbalance issues . The ensemble modeling of essay-based scoring models increased the scoring accuracy at both specific score levels and across all score levels. The ensemble modeling of essay -based scoring and each of the rationale-based scoring performed about the same. Further ensemble of essay -based scoring and both rationale-based scoring yielded the best scoring accuracy with QWK of 0.870 compared with 0.848 reported in literature. Introduction Automated essay scoring methodology develops along with the advances in AI technology. Starting from the early supervised machine learning models based on engineered features ( e.g., Mahana et al., 2012) to recent use of large language models (LLMs), the methods for automated essay scoring as demonstrated in Appendix A evolved with the advances in machine learning, deep learning, language models, and LLMs. Using automated scoring of Prompt 6 in the Automated Student Assessment Prize (ASAP) dataset from Kaggle, this study intends to explore the utility of rationales generated by LLMs in enhancing automated essay scoring. For the ASAP Prompt 6, automated scoring models have been developed since 2012 after the Kaggle competition.

large language model, machine learning, natural language, (19 more...)

2510.27131

Country: North America > United States > Maryland > Prince George's County > College Park (0.44)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Education > Educational Technology > Educational Software > Computer-Aided Assessment (1.00)
Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-19-2025, 08:39:48 GMT

A Societal impact Deep ensembles are popular in many real world applications, and a potential negative impact of our

Figure 1 provides the rESCE for in distribution vs out of distribution for CIFAR10 vs CIFAR10.1 and Imagenet vs ImagenetV2.

artificial intelligence, machine learning, metric, (13 more...)

Genre: Research Report > New Finding (0.46)

Industry: Social Sector (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Neural Information Processing SystemsAug-17-2025, 23:02:28 GMT

A Appendix

IND data and does not change accuracy.

artificial intelligence, cifar10 95, machine learning, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Rosenblatt, Lucas, Witter, R. Teal

FairlyUncertain: A Comprehensive Benchmark of Uncertainty in Algorithmic Fairness

arXiv.org Machine LearningOct-2-2024

Fair predictive algorithms hinge on both equality and trust, yet inherent uncertainty in real-world data challenges our ability to make consistent, fair, and calibrated decisions. While fairly managing predictive error has been extensively explored, some recent work has begun to address the challenge of fairly accounting for irreducible prediction uncertainty. However, a clear taxonomy and well-specified objectives for integrating uncertainty into fairness remains undefined. We address this gap by introducing FairlyUncertain, an axiomatic benchmark for evaluating uncertainty estimates in fairness. Our benchmark posits that fair predictive uncertainty estimates should be consistent across learning pipelines and calibrated to observed randomness. Through extensive experiments on ten popular fairness datasets, our evaluation reveals: (1) A theoretically justified and simple method for estimating uncertainty in binary settings is more consistent and calibrated than prior work; (2) Abstaining from binary predictions, even with improved uncertainty estimates, reduces error but does not alleviate outcome imbalances between demographic groups; (3) Incorporating consistent and calibrated uncertainty estimates in regression tasks improves fairness without any explicit fairness interventions. Additionally, our benchmark package is designed to be extensible and open-source, to grow with the field. By providing a standardized framework for assessing the interplay between uncertainty and fairness, FairlyUncertain paves the way for more equitable and trustworthy machine learning practices.

prediction, standard deviation, uncertainty estimate, (12 more...)

arXiv.org Machine Learning

2410.02005

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Industry: Law (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Gharoun, Hassan, Khorshidi, Mohammad Sadegh, Chen, Fang, Gandomi, Amir H.

Trust-informed Decision-Making Through An Uncertainty-Aware Stacked Neural Networks Framework: Case Study in COVID-19 Classification

arXiv.org Artificial IntelligenceSep-19-2024

This study presents an uncertainty-aware stacked neural networks model for the reliable classification of COVID-19 from radiological images. The model addresses the critical gap in uncertainty-aware modeling by focusing on accurately identifying confidently correct predictions while alerting users to confidently incorrect and uncertain predictions, which can promote trust in automated systems. The architecture integrates uncertainty quantification methods, including Monte Carlo dropout and ensemble techniques, to enhance predictive reliability by assessing the certainty of diagnostic predictions. Within a two-tier model framework, the tier one model generates initial predictions and associated uncertainties, which the second tier model uses to produce a trust indicator alongside the diagnostic outcome. This dual-output model not only predicts COVID-19 cases but also provides a trust flag, indicating the reliability of each diagnosis and aiming to minimize the need for retesting and expert verification. The effectiveness of this approach is demonstrated through extensive experiments on the COVIDx CXR-4 dataset, showing a novel approach in identifying and handling confidently incorrect cases and uncertain cases, thus enhancing the trustworthiness of automated diagnostics in clinical settings.

artificial intelligence, machine learning, prediction, (17 more...)

2410.02805

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.90)
Health & Medicine > Therapeutic Area > Immunology (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Zolnai-Lucas, Aaron, Boylan, Jack, Hokamp, Chris, Ghaffari, Parsa

STAGE: Simplified Text-Attributed Graph Embeddings Using Pre-trained LLMs

arXiv.org Artificial IntelligenceJul-10-2024

We present Simplified Text-Attributed Graph Embeddings (STAGE), a straightforward yet effective method for enhancing node features in Graph Neural Network (GNN) models that encode Text-Attributed Graphs (TAGs). Our approach leverages Large-Language Models (LLMs) to generate embeddings for textual attributes. STAGE achieves competitive results on various node classification benchmarks while also maintaining a simplicity in implementation relative to current state-of-the-art (SoTA) techniques. We show that utilizing pre-trained LLMs as embedding generators provides robust features for ensemble GNN training, enabling pipelines that are simpler than current SoTA approaches which require multiple expensive training and prompting stages. We also implement diffusion-pattern GNNs in an effort to make this pipeline scalable to graphs beyond academic benchmarks.

dataset, llm, prediction, (15 more...)

2407.1286

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Mendes, Pedro, Romano, Paolo, Garlan, David

Error-Driven Uncertainty Aware Training

arXiv.org Artificial IntelligenceMay-2-2024

Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness. In this work, we present a novel technique, named Error-Driven Uncertainty Aware Training (EUAT), which aims to enhance the ability of neural models to estimate their uncertainty correctly, namely to be highly uncertain when they output inaccurate predictions and low uncertain when their output is accurate. The EUAT approach operates during the model's training phase by selectively employing two loss functions depending on whether the training examples are correctly or incorrectly predicted by the model. This allows for pursuing the twofold goal of i) minimizing model uncertainty for correctly predicted inputs and ii) maximizing uncertainty for mispredicted inputs, while preserving the model's misprediction rate. We evaluate EUAT using diverse neural models and datasets in the image recognition domains considering both non-adversarial and adversarial settings. The results show that EUAT outperforms existing approaches for uncertainty estimation (including other uncertainty-aware training techniques, calibration, ensembles, and DEUP) by providing uncertainty estimates that not only have higher quality when evaluated via statistical metrics (e.g., correlation with residuals) but also when employed to build binary classifiers that decide whether the model's output can be trusted or not and under distributional data shifts.

baseline, euat, prediction, (14 more...)

2405.01205

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Goswami, Dhiman, Puspo, Sadiya Sayara Chowdhury, Raihan, Md Nishat, Emran, Al Nahian Bin, Ganguly, Amrita, Zampieri, Marcos

MasonTigers at SemEval-2024 Task 1: An Ensemble Approach for Semantic Textual Relatedness

arXiv.org Artificial IntelligenceApr-5-2024

This paper presents the MasonTigers entry to the SemEval-2024 Task 1 - Semantic Textual Relatedness. The task encompasses supervised (Track A), unsupervised (Track B), and cross-lingual (Track C) approaches across 14 different languages. MasonTigers stands out as one of the two teams who participated in all languages across the three tracks. Our approaches achieved rankings ranging from 11th to 21st in Track A, from 1st to 8th in Track B, and from 5th to 12th in Track C. Adhering to the task-specific constraints, our best performing approaches utilize ensemble of statistical machine learning approaches combined with language-specific BERT based models and sentence transformers.

arabic, ensemble 0, lr 0, (15 more...)

2403.1499

Country:

North America > United States (0.06)
Asia > Middle East > Jordan (0.04)
Africa (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)