AITopics | validation framework

Collaborating Authors

validation framework

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Model Validation of Agentic AI Systems: A POMDP-Based Framework for Belief-State, Forecast, and Policy Validation

Dixon, Matthew Francis

arXiv.org Machine LearningJun-17-2026

Agentic artificial intelligence systems introduce a new class of model risk. Unlike traditional predictive models, autonomous agents continuously acquire information, form beliefs regarding latent states of the environment, generate forecasts, select actions, and adapt their behavior over time. Existing validation methodologies focus primarily on predictive accuracy and therefore provide limited insight into the quality of the underlying decision process. This paper proposes a model validation framework for agentic AI based on Partially Observable Markov Decision Processes (POMDPs). The framework decomposes autonomous decision making into information, beliefs, forecasts, actions, and utility, allowing each component to be validated independently. Large language models (LLMs) are formalized as approximate Bayesian filtering operators, and a model-risk taxonomy is developed encompassing state-space, filtering, forecast, policy, utility-specification, and parameter risks. The model risk validation methodology is demonstrated through a portfolio-management case study in which an agent infers latent market regimes from market and macroeconomic information, generates belief-conditioned forecasts, and constructs portfolios using a Black--Litterman framework. Empirical validation combines performance analysis, belief calibration diagnostics, coverage tests, ablation studies, and parameter-sensitivity analysis. The results indicate that latent-state inference contributes independently to decision quality and that the principal conclusions remain robust across a broad range of parameter values. The principal contribution of the paper is a practical framework for extending established model risk management concepts to autonomous AI systems and providing a rigorous foundation for their validation, governance, and monitoring.

artificial intelligence, belief revision, machine learning, (19 more...)

arXiv.org Machine Learning

2606.17383

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Industry:

Banking & Finance > Trading (1.00)
Banking & Finance > Economy (0.93)
Information Technology > Security & Privacy (0.66)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Multi-Level Collaboration in Model Merging

Li, Qi, Yu, Runpeng, Wang, Xinchao

arXiv.org Artificial IntelligenceMar-3-2025

Parameter-level model merging is an emerging paradigm in multi-task learning with significant promise. Previous research has explored its connections with prediction-level model ensembling-commonly viewed as the upper bound for merging-to reveal the potential of achieving performance consistency between the two. However, this observation relies on certain preconditions, such as being limited to two models, using ViT-based models, and all models are fine-tuned from the same pre-trained checkpoint. To further understand the intrinsic connections between model merging and model ensembling, this paper explores an interesting possibility: If these restrictions are removed, can performance consistency still be achieved between merging and ensembling? To answer this question, we first theoretically establish a performance correlation between merging and ensembling. We find that even when previous restrictions are not met, there is still a way for model merging to attain a near-identical and superior performance similar to that of ensembling. To verify whether our findings are practical, we introduce a validation framework termed Neural Ligand (NeuLig). The learning process of NeuLig is meticulously designed with a specialized loss function supported by theoretical foundations. Experimental results demonstrate the robust resilience of NeuLig in terms of both model scale and the number of collaborating models. For instance, for the case involving 5 CLIP-ViT-B/32 models, parameter-level merging achieves the same performance as prediction-level ensembling (merging: 95.44% vs. ensembling: 95.46%).

collaboration, neulig, performance consistency, (15 more...)

arXiv.org Artificial Intelligence

2503.01268

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Virginia (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Major U.S. bank, a pioneer in the use of machine learning models, teams with Protiviti to improve its model validation framework

#artificialintelligenceDec-10-2019, 20:04:28 GMT

Following the financial crisis of 2007-2008, regulators issued specific guidance to help banks reduce the risk of financial losses or other adverse consequences stemming from decisions based on incorrect or misused financial models. Since then, the guidance has become the model risk management bible for financial institutions. It is used to ensure that model validation, typically performed annually, can identify vulnerabilities in the models and manage them effectively. Recently, the rapid advance and broader adoption of machine learning (ML) models have added more complexity and time to the model validation process. Specifically, ML models have highlighted expertise gaps in in-house model validation teams trained in traditional modeling techniques.

ml model, model validation framework, validation framework, (10 more...)

#artificialintelligence

Industry: Banking & Finance (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Derisking machine learning and artificial intelligence

#artificialintelligenceFeb-20-2019, 20:24:41 GMT

The added risk brought on by the complexity of machine-learning models can be mitigated by making well-targeted modifications to existing validation frameworks. Machine learning and artificial intelligence are set to transform the banking industry, using vast amounts of data to build models that improve decision making, tailor services, and improve risk management. According to the McKinsey Global Institute, this could generate value of more than $250 billion in the banking industry.1 1.For the purposes of this article machine learning is broadly defined to include algorithms that learn from data without being explicitly programmed, including, for example, random forests, boosted decision trees, support-vector machines, deep learning, and reinforcement learning. The definition includes both supervised and unsupervised algorithms. For a full primer on the applications of artificial intelligence, we refer the reader to "An executive's guide to AI."

algorithm, artificial intelligence, machine-learning model, (14 more...)

#artificialintelligence

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)

Genre: Workflow (0.47)

Industry:

Banking & Finance (1.00)
Information Technology > Security & Privacy (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Modeling outcomes of soccer matches

Tsokos, Alkeos, Narayanan, Santhosh, Kosmidis, Ioannis, Baio, Gianluca, Cucuringu, Mihai, Whitaker, Gavin, Király, Franz J.

arXiv.org Machine LearningJul-4-2018

We compare various extensions of the Bradley-Terry model and a hierarchical Poisson log-linear model in terms of their performance in predicting the outcome of soccer matches (win, draw, or loss). The parameters of the Bradley-Terry extensions are estimated by maximizing the log-likelihood, or an appropriately penalized version of it, while the posterior densities of the parameters of the hierarchical Poisson log-linear model are approximated using integrated nested Laplace approximations. The prediction performance of the various modeling approaches is assessed using a novel, context-specific framework for temporal validation that is found to deliver accurate estimates of the test error. The direct modeling of outcomes via the various Bradley-Terry extensions and the modeling of match scores using the hierarchical Poisson log-linear model demonstrate similar behavior in terms of predictive performance.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1807.01623

Country:

Europe > United Kingdom > England > Tyne and Wear > Sunderland (0.04)
North America > United States (0.04)
Europe > United Kingdom > Scotland (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback