AITopics | train-test split

Collaborating Authors

train-test split

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

05a7ad45d75a3082d7a3a70de8743140-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 21:54:34 GMT

ec number, reaction, sequence, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Materials > Chemicals (0.93)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Limitations

Neural Information Processing SystemsFeb-17-2026, 13:36:40 GMT

While our study identifies clear separations between model hypothesis classes, our best models still have not reached the consistency ceiling of the neural and behavioral benchmarks we have compared against. All models were simultaneously trained across all eight scenarios of the Physion Dynamics Training Set, constituting around 16,000 total training scenarios (2,000 scenes per scenario) [Bear et al., 2021], with a Each C-SWM [Kipf et al., 2020] model was trained on For each stimulus, we compute the proportion of "hit" responses by The Correlation to A verage Human Response is the Pearson's correlation between the model probability-hit vector and the human proportion-hit vector, across stimuli per scenario. OCP Accuracy of humans and models is the average accuracy, across stimuli per scenario. To give the final values of the two quantities, we then compute the weighted mean and s.e.m. of the above per Note that these values are therefore different for each condition, but always the same across all models. All neural predictivities are reported on heldout conditions and their timepoints.

artificial intelligence, machine learning, predictivity, (19 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

fc2e6a440b94f64831840137698021e1-Supplemental.pdf

Neural Information Processing SystemsFeb-12-2026, 00:57:47 GMT

ja 0, jb 0, relaxation, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Supplementary: CharacterizingGeneralizationunder Out-Of-DistributionShiftsinDeepMetricLearning

Neural Information Processing SystemsFeb-11-2026, 07:14:07 GMT

Subsequently, we select train-test splits from the same iteration steps. These settings are used throughout our study. For the few-shot experiments, the same pipeline parameters were utilized with changes noted in the respectivesection. However,thefactthatFIDscores are relatively close to another despite large semantic differences between datasets may indicate that FID based on our utilised FID estimator (Sec. Beyond these limits, generic representations learned byself-supervised learning may offerbetter zero-shot generalization,asalsodiscussedonSec.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

CharacterizingGeneralizationunder Out-Of-DistributionShiftsinDeepMetricLearning

Neural Information Processing SystemsFeb-11-2026, 07:14:03 GMT

However, common evaluation protocols only test a single, fixed data split in which train and test classes are assigned randomly. More realistic evaluations should consider abroad spectrum of distribution shifts with potentially varying degree and difficulty. In this work, we systematically construct train-test splits of increasing difficulty and present the ooDML benchmark to characterize generalization underout-of-distribution shifts inDML.ooDMLis

artificial intelligence, generalization, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Not All Splits Are Equal: Rethinking Attribute Generalization Across Unrelated Categories

Fircă, Liviu Nicolae, Bărbălau, Antonio, Oneata, Dan, Burceanu, Elena

arXiv.org Artificial IntelligenceNov-27-2025

Can models generalize attribute knowledge across semantically and perceptually dissimilar categories? While prior work has addressed attribute prediction within narrow taxonomic or visually similar domains, it remains unclear whether current models can abstract attributes and apply them to conceptually distant categories. This work presents the first explicit evaluation for the robustness of the attribute prediction task under such conditions, testing whether models can correctly infer shared attributes between unrelated object types: e.g., identifying that the attribute "has four legs" is common to both "dogs" and "chairs". To enable this evaluation, we introduce train-test split strategies that progressively reduce correlation between training and test sets, based on: LLM-driven semantic grouping, embedding similarity thresholding, embedding-based clustering, and supercategory-based partitioning using ground-truth labels. Results show a sharp drop in performance as the correlation between training and test categories decreases, indicating strong sensitivity to split design. Among the evaluated methods, clustering yields the most effective trade-off, reducing hidden correlations while preserving learnability. These findings offer new insights into the limitations of current representations and inform future benchmark construction for attribute reasoning.

correlation, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.06998

Country: Europe > Romania (0.15)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Zero-Shot Performance Prediction for Probabilistic Scaling Laws

Schram, Viktoria, Hiller, Markus, Beck, Daniel, Cohn, Trevor

arXiv.org Artificial IntelligenceOct-21-2025

The prediction of learning curves for Natural Language Processing (NLP) models enables informed decision-making to meet specific performance objectives, while reducing computational overhead and lowering the costs associated with dataset acquisition and curation. In this work, we formulate the prediction task as a multitask learning problem, where each task's data is modelled as being organized within a two-layer hierarchy. To model the shared information and dependencies across tasks and hierarchical levels, we employ latent variable multi-output Gaussian Processes, enabling to account for task correlations and supporting zero-shot prediction of learning curves (LCs). We demonstrate that this approach facilitates the development of probabilistic scaling laws at lower costs. Applying an active learning strategy, LCs can be queried to reduce predictive uncertainty and provide predictions close to ground truth scaling laws. We validate our framework on three small-scale NLP datasets with up to $30$ LCs. These are obtained from nanoGPT models, from bilingual translation using mBART and Transformer models, and from multilingual translation using M2M100 models of varying sizes.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.16743

Country:

Oceania > Australia (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

16009ce3d8a6872d79f056c75618911d-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 19:19:27 GMT

dataset, m-gam, missingness, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > North Carolina > Durham County > Durham (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (0.67)
Information Technology (0.67)

Technology:

Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

Add feedback

df438caa36714f69277daa92d608dd63-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 09:37:27 GMT

artificial intelligence, machine learning, predictivity, (19 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supplementary: Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning A Analyzing the model bias for selecting train-test splits

Neural Information Processing SystemsAug-17-2025, 12:48:21 GMT

These settings are used throughout our study. In Tab. 1 we show the measured FID scores between each For each dataset we show examples for an easy, medium and hard train-test split. Tab. 2 first illustrates the FID scores for all pairwise combinations However, the fact that FID scores are relatively close to another despite large semantic differences between datasets may indicate that FID based on our utilised FID estimator (Sec. This section provides additional results for the experiments presented in Sec. 4 in the main paper. To this end, we provide the exact performance values used to visualize Figure 1 in the main paper in Tab.

artificial intelligence, machine learning, train-test split, (14 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback