AITopics | empirical evaluation

Collaborating Authors

empirical evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e49eb6523da9e1c347bc148ea8ac55d3-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-14-2026, 19:22:15 GMT

control bias, empirical evaluation, estimation, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

aa36c88c27650af3b9868b723ae15dfc-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-13-2026, 12:17:17 GMT

algorithm, graphical model, logistic regression, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

Response to Reviewer 2: Empirical evaluation: Interestingly, we actually did an empirical evaluation in the earlier

Neural Information Processing SystemsFeb-9-2026, 06:23:30 GMT

We thank the reviewers for the positive feedback and their interest in our work! Below we address some questions. Both algorithms are well-tuned for hyperparameters. We didn't include it in the submission because after all the We will make sure to define them earlier in the paper in the revision. We are happy to clarify them.

artificial intelligence, evaluation, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.33)

Add feedback

0415740eaa4d9decbc8da001d3fd805f-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 08:16:02 GMT

artificial intelligence, generalization, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Measuring Aleatoric and Epistemic Uncertainty in LLMs: Empirical Evaluation on ID and OOD QA Tasks

Wang, Kevin, Moktar, Subre Abdoul, Li, Jia, Li, Kangshuo, Chen, Feng

arXiv.org Artificial IntelligenceNov-6-2025

Large Language Models (LLMs) have become increasingly pervasive, finding applications across many industries and disciplines. Ensuring the trustworthiness of LLM outputs is paramount, where Uncertainty Estimation (UE) plays a key role. In this work, a comprehensive empirical study is conducted to examine the robustness and effectiveness of diverse UE measures regarding aleatoric and epistemic uncertainty in LLMs. It involves twelve different UE methods and four generation quality metrics including LLMScore from LLM criticizers to evaluate the uncertainty of LLM-generated answers in Question-Answering (QA) tasks on both in-distribution (ID) and out-of-distribution (OOD) datasets. Our analysis reveals that information-based methods, which leverage token and sequence probabilities, perform exceptionally well in ID settings due to their alignment with the model's understanding of the data. Conversely, density-based methods and the P(True) metric exhibit superior performance in OOD contexts, highlighting their effectiveness in capturing the model's epistemic uncertainty. Semantic consistency methods, which assess variability in generated answers, show reliable performance across different datasets and generation metrics. These methods generally perform well but may not be optimal for every situation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.03166

Country: North America > United States > Texas (0.17)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

38db3aed920cf82ab059bfccbd02be6a-Reviews.html

Neural Information Processing SystemsOct-3-2025, 09:08:34 GMT

It is know that adding an additive gaussian noise to the feature is equivalent to an l_2 regularization in a least square problem (Bishop). This paper studies multiplicative Bernoulli feature noising, in a shallow learning architecture, with a general loss function and shows that it has the effect of adapting the geometry through an l_2 regularizer that rescales the feature (beta^{\top} D(beta,X) beta). The Matrix D(beta,X) is a estimate of the inverse diagonal fisher information. It is worth noting that D does not depend on the labels. The equivalent regularizer of dropout is non convex in general.

delta, dropout, regularizer, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Nevada (0.04)

Genre:

Summary/Review (0.48)
Research Report > New Finding (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

3493894fa4ea036cfc6433c3e2ee63b0-Reviews.html

Neural Information Processing SystemsOct-3-2025, 08:46:30 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes an approach to stochastic multi-objective optimization. The main idea is simply described: optimize a single objective while taking other objectives as constraints. The authors proposes a primal-dual stochastic optimization algorithm to solve the problem and prove that it achieves (for the primal objective) the optimal 1/\sqrt{T} convergence rate. As far as I am concerned, the theory is solid and it does provide a good insight into the problem of interest.

algorithm, constraint, objective, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Nevada (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.51)

Add feedback

0deb1c54814305ca9ad266f53bc82511-Reviews.html

Neural Information Processing SystemsOct-3-2025, 06:42:27 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper proposes and analyzes a method for learning in robust MDPs. While this setting is very similar to learning in stochastic games, the main difference is that in robust games, the optimal move of the opponent is observed, while in robust MDPs the decision maker only observes the outcome (the opponent chooses the probabilities). The paper makes a small advance on a relevant, non-trivial, and interesting topic, but I am not sure that it is quite ready for publication in its current form. First, the setting is somewhat contrived and not motivated. A natural setting would be simply to use reinforcement learning to learn to act in a robust setting.

adversary, algorithm, transition, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Nevada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.36)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 21:08:01 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper proposes a supervised learning algorithm. It uses stochastic gradient descent and periodically expands the hypothesis space by introducing new basis functions and adding corresponding components to the weight vector. As such, as it processes more data, it fits more complex models. The hypothesis space considered here are polynomials and higher order monomials are gradually introduced to the model. The concept of growing the hypothesis space as more data is introduced is not new (training kernel methods with SGD exhibits this behavior), but in the proposed method, choosing which monomials to add to the hypothesis space is very cheap.

algorithm, hypothesis space, monomial, (11 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.05)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

To Reviewer

Neural Information Processing SystemsOct-1-2025, 22:22:33 GMT

It seems you misunderstood some key points and details. Hope our explanation below could help to clarify some misunderstandings and confusion. By "specific learning rate schedule", we think We think the empirical evidence is sufficient to verify our theoretical claims. This is exactly the case here. Figure 1(b) in [Triantafillou et al. 2020] shows that the increase of shots For your other comments: 1) The inner-task gap vanishes because the expectation of the loss function w.r.t.

artificial intelligence, generalization, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback