AITopics | model evaluation

Collaborating Authors

model evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

All that structure matches does not glitter

Neural Information Processing SystemsJun-11-2026, 16:21:44 GMT

Generative models for materials, especially inorganic crystals, hold potential to transform the theoretical prediction of novel compounds and structures. Advancement in this field depends critically on robust benchmarks and minimal, information-rich datasets that enable meaningful model evaluation. This paper critically examines common datasets and reported metrics for a crystal structure prediction task--generating the most likely structures given the chemical composition of a material. We focus on three key issues: First, materials datasets should contain unique crystal structures; for example, we show that the widely-utilized carbon-24 dataset only contains $\approx 40$% unique structures. Second, materials datasets should not be split randomly if polymorphs of many different compositions are numerous--which we find to be the case for the perov-5 and MP-20 datasets.

artificial intelligence, dataset, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Automatic Unsupervised Outlier Model Selection

Neural Information Processing SystemsApr-25-2026, 03:11:56 GMT

Given an unsupervised outlier detection task on a new dataset, how can we automatically select a good outlier detection algorithm and its hyperparameter(s) (collectively called a model)? In this work, we tackle the unsupervised outlier model selection (UOMS) problem, and propose METAOD, a principled, data-driven approach to UOMS based on meta-learning. The UOMS problem is notoriously challenging, as compared to model selection for classification and clustering, since (i) model evaluation is infeasible due to the lack of hold-out data with labels, and (ii) model comparison is infeasible due to the lack of a universal objective function. METAOD capitalizes on the performances of a large body of detection models on historical outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset without any labels, model evaluations or model comparisons. To capture task similarity within our meta-learning framework, we introduce specialized metafeatures that quantify outlying characteristics of a dataset. Extensive experiments show that selecting a model by METAOD significantly outperforms no model selection (e.g.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Genre:

Research Report (0.68)
Instructional Material > Course Syllabus & Notes (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

the Hamiltonian bound

Neural Information Processing SystemsApr-24-2026, 11:51:17 GMT

Algorithm 6 Generating the (non-differentiable) Hamiltonian AIS variational bound. Figure 1 shows the results. The first row shows the results obtained by tuning the pair (,η) and each other parameter individually for different values of K, and the second row shows the results obtained by tuning increasingly more parameters. It can be observed that tuning β and q(z) lead to the largest gains in performance. Figure 4: Tuning more parameters leads to significantly better results.

artificial intelligence, machine learning, simulation, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

RAAGBl Wh25-3535-5050-6565-80Acc 2 s21 s63 s74 s54 s298 s685 s660 s40% 0�mpaaaacmpmpmpmpiaaaECEtkmpmpmpsleeEtllllseeeeilllmsssseeesss ate MAE vs Oracle

Neural Information Processing SystemsApr-24-2026, 08:55:55 GMT

Evaluating the performance of machine learning models on diverse and underrepresented subgroups is essential for ensuring fairness and reliability in real-world applications. However, accurately assessing model performance becomes challenging due to two main issues: (1) a scarcity of test data, especially for small subgroups, and (2) possible distributional shifts in the model's deployment setting, which may not align with the available test data. In this work, we introduce 3STesting, a deep generative modeling framework to facilitate model evaluation by generating synthetic test sets for small subgroups and simulating distributional shifts. Our experiments demonstrate that 3STesting outperforms traditional baselines--including real test data alone--in estimating model performance on minority subgroups and under plausible distributional shifts. In addition, 3S offers intervals around its performance estimates, exhibiting superior coverage of the ground truth compared to existing approaches. Overall, these results raise the question of whether we need a paradigm shift away from limited real test data towards synthetic test data.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)

Add feedback

Weak Supervision Performance Evaluation via Partial Identification

Neural Information Processing SystemsMar-22-2026, 21:17:21 GMT

Programmatic Weak Supervision (PWS) enables supervised model training without direct access to ground truth labels, utilizing weak labels from heuristics, crowdsourcing, or pre-trained models. However, the absence of ground truth complicates model evaluation, as traditional metrics such as accuracy, precision, and recall cannot be directly calculated. In this work, we present a novel method to address this challenge by framing model evaluation as a partial identification problem and estimating performance bounds using Fréchet bounds. Our approach derives reliable bounds on key metrics without requiring labeled data, overcoming core limitations in current weak supervision evaluation techniques. Through scalable convex optimization, we obtain accurate and computationally efficient bounds for metrics including accuracy, precision, recall, and F1-score, even in high-dimensional settings. This framework offers a robust approach to assessing model quality without ground truth labels, enhancing the practicality of weakly supervised learning for real-world applications.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

VISA: Variational Inference with Sequential Sample-Average Approximations

Neural Information Processing SystemsFeb-18-2026, 19:07:21 GMT

We perform experiments on high-dimensional Gaussians, Lotka-V olterra dynamics, and a Pickover attractor.

approximation, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

874067cee9c98dc9b9980fa6ef70176a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 09:30:56 GMT

benchmark, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

6a55f024db3f771194bdadc8f3a35381-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 10:05:20 GMT

accuracy, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Add feedback

GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels

Neural Information Processing SystemsFeb-13-2026, 10:04:03 GMT

DiscGraph set captures wide-range and diverse graph data distribution discrepancies through a discrepancy measurement function, which exploits the outputs of GNNs related to latent node embeddings and node class predictions.

data mining, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Oceania > Australia (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

Actively Testing Y our Model While It Learns: Realizing Label-Efficient Learning in Practice Dayou Yu

Neural Information Processing SystemsFeb-12-2026, 19:17:00 GMT

In active learning (AL), we focus on reducing the data annotation cost from the model training perspective. However, "testing", which often refers to the model

artificial intelligence, evaluation, machine learning, (18 more...)

Neural Information Processing Systems

Country: