AITopics | certification

Collaborating Authors

certification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deployment-complete benchmarking

Mansouri, El Mustapha, Arai, Keigo

arXiv.org Machine LearningMay-26-2026

Benchmarks increasingly guide deployment, procurement and scientific screening, yet a score supports only the response it records, not necessarily the deployment action. We introduce deployment-complete benchmarking, which tests whether benchmark evidence determines a deployment action. A benchmark is complete for a claim exactly when the action is constant on each evidence fiber; mixed fibers expose missing deployment information, and completion curves quantify the evidence required to resolve ambiguity. In controlled response spaces, benchmark-channel conformal coverage of 94.98% transferred poorly to an unmeasured deployment channel (10.07%), whereas response-rank intervals achieved 94.91% coverage; even zero benchmark error certified only 45.4% of candidates at the largest residual size. Public audits revealed incompleteness, including 97.9% mixed Tox21 fibers and zero median certifiable fraction in main Matbench and JARVIS audits. In held-out replays, certify-then-acquire reduced false decisions from 1.19% to 0.027% in Tox21 and from 20.3% to 0.128% in JARVIS, while changing model choice and identifying deployment-relevant probes. Deployment-ready benchmarks should report evidence, supported actions, ambiguity and completion cost rather than scores alone.

artificial intelligence, fiber, machine learning, (18 more...)

arXiv.org Machine Learning

2605.25997

Country: Asia > Japan > Honshū > Kantō (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

Khosravi, Hamed, Huo, Xiaoming

arXiv.org Machine LearningMay-21-2026

A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $α$. The operator needs a safety certificate for this deployment's stream at every round: no pooling across deployments, no waiting for a long-run average. Existing wrappers cannot deliver this on adaptive, online-updated streams: offline conformal-risk methods require exchangeability; online-conformal methods bound only long-run averages; non-exchangeable extensions are marginally valid; and the closest anytime wrapper, A-RCPS, controls marginal rather than selective risk. Using a (test statistic, validity guarantee, deployment rule) framework, we identify one empty cell forced by deployment requirements: e-process per threshold, selective risk, anytime-pathwise validity, max-certified-threshold rule. Conformal Selective Acting (CSA) fills it as a per-round wrapper maintaining a Ville-type e-process per threshold on a Bonferroni grid, evaluated against the RLVR filtration. Under predictable updates and isotonic-calibrated monotone risk we prove (i) an anytime-pathwise selective-risk bound $R_T^{\mathrm{act}}\leα+O(N_T^{-1/2})$, (ii) rate-optimal certification matching $Θ(\barη^{-2}\log(1/δ))$, and (iii) a horizon-independent release-rate gap. Across eight specialist benchmarks ($480$ streams), sixteen adversarial distribution-shift cells ($160$ streams), and five live Expert-Iteration RLVR cells with online LoRA over four base models in three architecture families ($10{,}300$ rounds), CSA is the only method among ten compared that satisfies pathwise validity and non-refusing deployment on every cell. We do not propose a new LLM, training algorithm, or policy class; CSA is the deployment-side complement, orthogonal to the model, for operators who cannot use a frontier API.

large language model, machine learning, natural language, (22 more...)

arXiv.org Machine Learning

2605.2027

Genre: Research Report (0.82)

Industry:

Health & Medicine (1.00)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

4d215ab7508a3e089af43fb605dd27d1-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 19:40:36 GMT

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > France (0.15)

Industry:

Transportation (0.47)
Aerospace & Defense (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

_NeurIPS2023_CR__Certified_Backdoor_Detection.pdf

weixi

Neural Information Processing SystemsApr-24-2026, 22:24:43 GMT

The main purpose of this research is to provide the user of DNN classifiers with a method to detect if the model is backdoor attacked without access to the training set. All attacks used to evaluate our detection method in this paper are created by published backdoor attack strategies on public datasets. Thus, we did not create new threats to society. Moreover, our work provides a new perspective on backdoor defense, as it is the first to address the certification of backdoor detection. It helps other researchers to understand the behavior of deep learning systems facing malicious activities. While existing backdoor detectors are all empirical [67, 20, 75, 41, 69, 6, 56, 13], our work initiates a new research direction - backdoor detection with certification. Moreover, we first exposed that certified backdoor detectors and certified robustness against backdoor attacks complement each other [86, 71, 27, 53].

artificial intelligence, backdoor attack, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

_NeurIPS2023_CR__Certified_Backdoor_Detection.pdf

weixi

Neural Information Processing SystemsApr-24-2026, 22:24:40 GMT

artificial intelligence, backdoor attack, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

146b4bab3f8536a07905f25d367b4924-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 17:51:05 GMT

accuracy, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

146b4bab3f8536a07905f25d367b4924-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 17:51:01 GMT

Tree-based models are used in many high-stakes application domains such as finance and medicine, where robustness and interpretability are of utmost importance. Yet, methods for improving and certifying their robustness are severely under-explored, in contrast to those focusing on neural networks. Targeting this important challenge, we propose deterministic smoothing for decision stump ensembles. Whereas most prior work on randomized smoothing focuses on evaluating arbitrary base models approximately under input randomization, the key insight of our work is that decision stump ensembles enable exact yet efficient evaluation via dynamic programming. Importantly, we obtain deterministic robustness certificates, even jointly over numerical and categorical features, a setting ubiquitous in the real world. Further, we derive an MLE-optimal training method for smoothed decision stumps under randomization and propose two boosting approaches to improve their provable robustness. An extensive experimental evaluation on computer vision and tabular data tasks shows that our approach yields significantly higher certified accuracies than the state-of-the-art for tree-based models. We release all code and trained models at https://github.com/eth-sri/drs.

artificial intelligence, ensemble, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia (0.93)
Europe (0.67)
North America > United States > California > San Francisco County > San Francisco (0.28)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.76)

Add feedback

Average-case hardness of RIP certification

Tengyao Wang, Quentin Berthet, Yaniv Plan

Neural Information Processing SystemsApr-22-2026, 07:05:22 GMT

The restricted isometry property (RIP) for design matrices gives guarantees for optimal recovery in sparse linear models. It is of high interest in compressed sensing and statistical learning. This property is particularly important for computationally efficient recovery methods. As a consequence, even though it is in general NP-hard to check that RIP holds, there have been substantial efforts to find tractable proxies for it. These would allow the construction of RIP matrices and the polynomial-time verification of RIP given an arbitrary matrix. We consider the framework of average-case certifiers, that never wrongly declare that a matrix is RIP, while being often correct for random instances. While there are such functions which are tractable in a suboptimal parameter regime, we show that this is a computationally hard task in any better regime. Our results are based on a new, weaker assumption on the problem of detecting dense subgraphs.

artificial intelligence, machine learning, matrix, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Boroux Versus Rorra Countertop Water Filters, Tested Head to Head

WIREDMar-18-2026, 11:06:00 GMT

In a world of plastic water filter pitchers, I tested two of the new generation of stainless-steel filter systems. I will admit that the popularity of those giant, stainless steel, gravity-fed water filters remained a mystery to me for some years--even as multi-gallon water filter systems from brands like British Berkefeld and Berkey seemed to proliferate equally among lovers of doomsday prepping and holistic wellness retreats. I have been testing much different breeds of water filters for more than a year now, including reverse osmosis filters and water pitchers. But often, the big water filter tanks have seemed as much like status symbols as functional items. If you see a big gravity-fed filter, you know the person in question is serious about wellness, survival, or both. What changed my mind about these big stainless steel filters was microplastics . Most water filter pitchers are made of BPA-free plastic. But as new research shows that bottled-water drinkers ingest tens of thousands of excess microplastic particles, wellness lovers have begun to look askance at water filters that are themselves made of plastic.

artificial intelligence, boroux, certification, (18 more...)

WIRED

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > Scotland (0.04)
(2 more...)

Genre: Research Report > New Finding (0.54)

Industry:

Materials (1.00)
Law (0.68)
Health & Medicine (0.67)
(2 more...)

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

Certified Robustness for Deep Equilibrium Models via Serialized Random Smoothing

Neural Information Processing SystemsMar-17-2026, 19:57:01 GMT

Implicit models such as Deep Equilibrium Models (DEQs) have emerged as promising alternative approaches for building deep neural networks. Their certified robustness has gained increasing research attention due to security concerns. Existing certified defenses for DEQs employing interval bound propagation and Lipschitz-bounds not only offer conservative certification bounds but also are restricted to specific forms of DEQs. In this paper, we provide the first randomized smoothing certified defense for DEQs to solve these limitations. Our study reveals that simply applying randomized smoothing to certify DEQs provides certified robustness generalized to large-scale datasets but incurs extremely expensive computation costs. To reduce computational redundancy, we propose a novel Serialized Randomized Smoothing (SRS) approach that leverages historical information. Additionally, we derive a new certified radius estimation for SRS to theoretically ensure the correctness of our algorithm. Extensive experiments and ablation studies on image recognition demonstrate that our algorithm can significantly accelerate the certification of DEQs by up to 7x almost without sacrificing the certified accuracy. The implementation will be publicly available upon the acceptance of this work.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback