AITopics

2605.06939

Country:

Asia (0.93)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Neural Information Processing SystemsApr-30-2026, 01:05:44 GMT

Limitations

While our study identifies clear separations between model hypothesis classes, our best models still have not reached the consistency ceiling of the neural and behavioral benchmarks we have compared against. The latent future prediction dynamics modules of all the foundation models were pretrained on Physion just as the end-to-end models were, and those Physion trained dynamics modules were evaluated against neural and behavioral data, ultimately outperforming the end-to-end Physion dynamics. Despite our interest, pretraining the end-to-end models on datasets larger than Physion exceeds our current computational resources, as evidenced by models like FitVid requiring nearly a month of training on eight A100 GPUs with Physion alone. Therefore, the vision foundation models ultimately have to deal with the harder problem of generalizing to Physion compared to end-to-end models. While we believe our dynamically-equipped foundation model paradigm to be a generally promising way forward towards models with strong internal simulations, we identify in the Discussion ( 7), several ways that their encoder and dynamics modules can be improved, which we plan to explore in future work.

artificial intelligence, machine learning, predictivity, (18 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-9-2026, 09:44:22 GMT

P-GAMs have similar scaling with the number of input dimensions as traditional GLMs

We need some form of group sparsity and traditional ARD can't do that.

artificial intelligence, input dimension, p-gam, (10 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.74)

Technology: Information Technology > Artificial Intelligence (0.31)

Neural Information Processing SystemsDec-23-2025, 21:32:42 GMT

Automatic Unsupervised Outlier Model Selection

Given an unsupervised outlier detection task on a new dataset, how can we automatically select a good outlier detection algorithm and its hyperparameter(s) (collectively called a model)? In this work, we tackle the unsupervised outlier model selection (UOMS) problem, and propose MetaOD, a principled, data-driven approach to UOMS based on meta-learning. The UOMS problem is notoriously challenging, as compared to model selection for classification and clustering, since (i) model evaluation is infeasible due to the lack of hold-out data with labels, and (ii) model comparison is infeasible due to the lack of a universal objective function. MetaOD capitalizes on the performances of a large body of detection models on historical outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset without any labels, model evaluations or model comparisons. To capture task similarity within our meta-learning framework, we introduce specialized meta-features that quantify outlying characteristics of a dataset. Extensive experiments show that selecting a model by MetaOD significantly outperforms no model selection (e.g.

automatic unsupervised outlier model selection, dataset, name change, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Stenger, David, Lindicke, Armin, von Rohr, Alexander, Trimpe, Sebastian

Local Entropy Search over Descent Sequences for Bayesian Optimization

arXiv.org Machine LearningNov-25-2025

Searching large and complex design spaces for a global optimum can be infeasible and unnecessary. A practical alternative is to iteratively refine the neighborhood of an initial design using local optimization methods such as gradient descent. We propose local entropy search (LES), a Bayesian optimization paradigm that explicitly targets the solutions reachable by the descent sequences of iterative optimizers. The algorithm propagates the posterior belief over the objective through the optimizer, resulting in a probability distribution over descent sequences. It then selects the next evaluation by maximizing mutual information with that distribution, using a combination of analytic entropy calculations and Monte-Carlo sampling of descent sequences. Empirical results on high-complexity synthetic objectives and benchmark problems show that LES achieves strong sample efficiency compared to existing local and global Bayesian optimization methods.

complexity, descent sequence, optimization, (12 more...)

2511.19241

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

arXiv.org Artificial IntelligenceNov-24-2025

Large language models for automated PRISMA 2020 adherence checking

Kataoka, Yuki, So, Ryuhei, Banno, Masahiro, Tsujimoto, Yasushi, Takayama, Tomohiro, Yamagishi, Yosuke, Tsuge, Takahiro, Yamamoto, Norio, Suda, Chiaki, Furukawa, Toshi A.

Evaluating adherence to PRISMA 2020 guideline remains a burden in the peer review process. To address the lack of shareable benchmarks, we constructed a copyright-aware benchmark of 108 Creative Commons-licensed systematic reviews and evaluated ten large language models (LLMs) across five input formats. In a development cohort, supplying structured PRISMA 2020 checklists (Markdown, JSON, XML, or plain text) yielded 78.7-79.7% accuracy versus 45.21% for manuscript-only input (p less than 0.0001), with no differences between structured formats (p>0.9). Across models, accuracy ranged from 70.6-82.8% with distinct sensitivity-specificity trade-offs, replicated in an independent validation cohort. We then selected Qwen3-Max (a high-sensitivity open-weight model) and extended evaluation to the full dataset (n=120), achieving 95.1% sensitivity and 49.3% specificity. Structured checklist provision substantially improves LLM-based PRISMA assessment, though human expert verification remains essential before editorial decisions.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.16707

Country:

Asia > Japan > Honshū > Kansai (0.15)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services (0.93)
Information Technology (0.88)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Brauer, Alexej, Menzel, Paul

Gini-based Model Monitoring: A General Framework with an Application to Non-life Insurance Pricing

arXiv.org Machine LearningOct-7-2025

In a dynamic landscape where portfolios and environments evolve, maintaining the accuracy of pricing models is critical. To the best of our knowledge, this is the first study to systematically examine concept drift in non-life insurance pricing. We (i) provide an overview of the relevant literature and commonly used methodologies, clarify the distinction between virtual drift and concept drift, and explain their implications for long-run model performance; (ii) review and formalize common performance measures, including the Gini index and deviance loss, and articulate their interpretation; (iii) derive the asymptotic distribution of the Gini index, enabling valid inference and hypothesis testing; and (iv) present a standardized monitoring procedure that indicates when refitting is warranted. We illustrate the framework using a modified real-world portfolio with induced concept drift and discuss practical considerations and pitfalls.

concept drift, dataset, gini index, (16 more...)

2510.04556

Country:

Europe > Switzerland (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Overview (1.00)

Industry: Banking & Finance > Insurance (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.34)

Kucharský, Šimon, Mishra, Aayush, Habermann, Daniel, Radev, Stefan T., Bürkner, Paul-Christian

Towards Trustworthy Amortized Bayesian Model Comparison

arXiv.org Machine LearningAug-29-2025

Amortized Bayesian model comparison (BMC) enables fast probabilistic ranking of models via simulation-based training of neural surrogates. However, the reliability of neural surrogates deteriorates when simulation models are misspecified - the very case where model comparison is most needed. Thus, we supplement simulation-based training with a self-consistency (SC) loss on unlabeled real data to improve BMC estimates under empirical distribution shifts. Using a numerical experiment and two case studies with real data, we compare amortized evidence estimates with and without SC against analytic or bridge sampling benchmarks. SC improves calibration under model misspecification when having access to analytic likelihoods. However, it offers limited gains with neural surrogate likelihoods, making it most practical for trustworthy BMC when likelihoods are exact.

artificial intelligence, likelihood, machine learning, (18 more...)

2508.20614

Country:

Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > Rensselaer County > Troy (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry:

Transportation (0.71)
Education > Educational Technology > Educational Software > Computer Based Training (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)

Neural Information Processing SystemsAug-15-2025, 04:25:28 GMT

94d2a3c6dd19337f2511cdf8b4bf907e-AuthorFeedback.pdf

dataset, input dimension, p-gam, (7 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.74)

Technology: Information Technology > Artificial Intelligence (0.31)

Gunes, Berkay, Buder, Sven, Buck, Tobias

A COMPASS to Model Comparison and Simulation-Based Inference in Galactic Chemical Evolution

arXiv.org Artificial IntelligenceJul-9-2025

We present COMPASS, a novel simulation-based inference framework that combines score-based diffusion models with transformer architectures to jointly perform parameter estimation and Bayesian model comparison across competing Galactic Chemical Evolution (GCE) models. COMPASS handles high-dimensional, incomplete, and variable-size stellar abundance datasets. Applied to high-precision elemental abundance measurements, COMPASS evaluates 40 combinations of nucleosynthetic yield tables. The model strongly favours Asymptotic Giant Branch yields from NuGrid and core-collapse SN yields used in the IllustrisTNG simulation, achieving near-unity cumulative posterior probability. Using the preferred model, we infer a steep high-mass IMF slope and an elevated Supernova Ia normalization, consistent with prior solar neighbourhood studies but now derived from fully amortized Bayesian inference. Our results demonstrate that modern SBI methods can robustly constrain uncertain physics in astrophysical simulators and enable principled model selection when analysing complex, simulation-based data.

artificial intelligence, compass, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.0506

Country:

Europe (0.46)
Oceania > Australia (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)