AITopics | benchmark task

Collaborating Authors

benchmark task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models

Neural Information Processing SystemsJun-18-2026, 15:37:26 GMT

Simulation-based inference (SBI) offers a flexible and general approach to performing Bayesian inference: In SBI, a neural network is trained on synthetic data simulated from a model and used to rapidly infer posterior distributions for observed data. A key goal for SBI is to achieve accurate inference with as few simulations as possible, especially for expensive simulators. In this work, we address this challenge by repurposing recent probabilistic foundation models for tabular data: We show how tabular foundation models--specifically TabPFN--can be used as pre-trained autoregressive conditional density estimators for SBI. We propose Neural Posterior Estimation with Prior-data Fitted Networks (NPE-PFN) and show that it is competitive with current SBI approaches in terms of accuracy for both benchmark tasks and two complex scientific inverse problems. Crucially, it often substantially outperforms them in terms of simulation efficiency, sometimes requiring orders of magnitude fewer simulations. NPE-PFN eliminates the need for selecting and training an inference network and tuning its hyperparameters. We also show that it exhibits superior robustness to model misspecification and can be scaled to simulation budgets that exceed the context size limit of TabPFN. NPE-PFN provides a new direction for SBI, where training-free, general-purpose inference models offer efficient, easy-to-use, and flexible solutions for a wide range of stochastic inverse problems.

machine learning, natural language, simulation, (18 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Information Technology (0.93)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

a76a757ed479a1e6a5f8134bea492f83-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-29-2026, 07:39:18 GMT

data mining, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Overview (0.67)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Public Health (1.00)
(11 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Supplementary Information: TARTARUS: Practical and Realistic Benchmarks for Inverse Molecular Design

Neural Information Processing SystemsApr-24-2026, 14:34:16 GMT

S1. INTRODUCTION Traditionally, property-guided optimization has relied on expert intuition [1] and several rounds of trial, error, and human-inspired optimization, occasionally supported by computer simulations. Alternatively, computer-assisted approaches have employed virtual screening [2] or optimization algorithms such as genetic algorithms (GAs) [3-5]. More recently, with the surge of deep learning, deep generative models have emerged, specifically designed to operate in chemical space and tackle inverse molecular design [6-8]. This has led to the development of numerous algorithmic approaches for this purpose, with the most popular including variational autoencoders (VAEs) [9, 10], generative adversarial networks (GANs) [11, 12], and reinforcement learning (RL) [13, 14]. METHODSOVERVIEW In this section, we provide an overview of the molecular generative models employed throughout this work and summarize the associated design choices we needed to make during their implementation. The molecular design algorithms we considered are VAEs, long short-term memory hill climbing (LSTM-HC) models [15-17], REINVENT [18], JANUS [19], and a graph-based genetic algorithm (GB-GA) [20]. At the core of the majority of these approaches are molecular string representations, the most commonly used of which is the Simplified Molecular Input Line Entry System (SMILES) [21]. Accordingly, many of the algorithms tested rely on predicting subsequent characters from partial strings to propose structures. However, algorithms based on SMILES can regularly produce invalid strings that do not represent molecules, which is problematic both in terms of robustness and interpretability of the corresponding methodologies [22, 23]. Consequently, this issue was addressed systematically by introducing Self-Referencing Embedded Strings (SELFIES) [22], a molecular string representation that guarantees validity. Thus, unlike for SMILES, every arbitrary combination of SELFIES characters represents a molecule. Nevertheless, its impact on structure optimization has not yet been evaluated systematically [23]. To address this issue, we modify some of the existing generative models relying on SMILES to be also compatible with SELFIES and test their performance depending on representation, similar to how it has been done recently [24]. Among the models tested, REINVENT, the VAEs, and the LSTM-HC models use recurrent neural networks (RNNs) [25] to learn the conditional probability distributions of the characters that represent molecules. RNNs are a class of artificial neural networks (ANNs) that utilize sequential information from their previous predictions and states.

artificial intelligence, machine learning, molecule, (17 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Energy > Renewable (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

09f8b2469a3d1089a7c60d9ef1983271-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-24-2026, 14:34:14 GMT

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.67)
North America > Canada > Ontario (0.28)

Genre:

Workflow (0.95)
Research Report > New Finding (0.92)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Chemicals (0.67)
Energy > Renewable > Solar (0.47)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.93)
(2 more...)

Add feedback

Simple random search of static linear policies is competitive for reinforcement learning

Neural Information Processing SystemsMar-16-2026, 22:28:05 GMT

Model-free reinforcement learning aims to offer off-the-shelf solutions for controlling dynamical systems without requiring models of the system dynamics. We introduce a model-free random search algorithm for training static, linear policies for continuous control problems. Common evaluation methodology shows that our method matches state-of-the-art sample efficiency on the benchmark MuJoCo locomotion tasks. Nonetheless, more rigorous evaluation reveals that the assessment of performance on these benchmarks is optimistic. We evaluate the performance of our method over hundreds of random seeds and many different hyperparameter configurations for each benchmark task. This extensive evaluation is possible because of the small computational footprint of our method. Our simulations highlight a high variability in performance in these benchmark tasks, indicating that commonly used estimations of sample efficiency do not adequately evaluate the performance of RL algorithms. Our results stress the need for new baselines, benchmarks and evaluation methodology for RL algorithms.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.65)

Add feedback

Appendix A Additional results This appendix section shows additional results and corresponding plots to support the insights

Neural Information Processing SystemsFeb-17-2026, 12:24:05 GMT

Section A.2 shows results using a chat-style verbalized numeric Section A.3 shows results on four extra benchmark tasks made available with Finally, Section A.5 presents and discusses results on feature In this section, we evaluate risk score calibration on the income prediction task across different subpopulations, such as typically done as part of a fairness audit. Figures A1-A2 show group-conditional calibration curves for all models on the ACSIncome task, evaluated on three subgroups specified by the race attribute in the ACS data. We show the three race categories with largest representation. The'Mixtral 8x22B' and'Yi 34B' models shown are the worst offenders, where samples belonging to the'Black' population see consistently lower scores for the same positive label probability when compared to the'Asian' or'White' populations. On average, the'Mixtral 8x22B (it)' model classifies a Black individual with a In fact, this score bias can be reversed for some base models, overestimating scores from Black individuals compared with other subgroups.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand (0.04)
North America > United States > California (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

b0a4b3e384b4554e65a47ad1f6b0310a-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-17-2026, 12:24:02 GMT

data mining, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > California (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.92)
Questionnaire & Opinion Survey (0.68)

Industry:

Government (0.92)
Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
(2 more...)

Add feedback

Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation Julius V etter,1,2, Guy Moss

Neural Information Processing SystemsFeb-17-2026, 02:20:53 GMT

Scientific modeling applications often require estimating a distribution of parameters consistent with a dataset of observations--an inference task also known as source distribution estimation. This problem can be ill-posed, however, since many different source distributions might produce the same distribution of data-consistent simulations. To make a principled choice among many equally valid sources, we propose an approach which targets the maximum entropy distribution, i.e., prioritizes retaining as much uncertainty as possible.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Oceania > Australia (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Government (0.93)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

Neural Information Processing SystemsFeb-16-2026, 18:49:34 GMT

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model over-fitting.

machine learning, model shift, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

Add feedback

Filters

Collaborating Authors

benchmark task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models

007ff380ee5ac49ffc34442f5c2a2b86-Paper.pdf

a76a757ed479a1e6a5f8134bea492f83-Paper-Datasets_and_Benchmarks.pdf

Supplementary Information: TARTARUS: Practical and Realistic Benchmarks for Inverse Molecular Design

09f8b2469a3d1089a7c60d9ef1983271-Paper-Datasets_and_Benchmarks.pdf

Simple random search of static linear policies is competitive for reinforcement learning

Appendix A Additional results This appendix section shows additional results and corresponding plots to support the insights

b0a4b3e384b4554e65a47ad1f6b0310a-Paper-Datasets_and_Benchmarks_Track.pdf

Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation Julius V etter,1,2, Guy Moss

How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization