AITopics | testing

Collaborating Authors

testing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Testing of Samplers

Neural Information Processing SystemsDec-23-2025, 23:31:55 GMT

Given a set of items F and a weight function W: F -> (0,1), the problem of sampling seeks to sample an item proportional to its weight. Sampling is a fundamental problem in machine learning. The daunting computational complexity of sampling with formal guarantees leads designers to propose heuristics-based techniques for which no rigorous theoretical analysis exists to quantify the quality of the generated distributions. This poses a challenge in designing a testing methodology to test whether a sampler under test generates samples according to a given distribution. Only recently, Chakraborty and Meel (2019) designed the first scalable verifier, called Barbarik, for samplers in the special case when the weight function W is constant, that is, when the sampler is supposed to sample uniformly from F. The techniques in Barbarik, however, fail to handle general weight functions. The primary contribution of this paper is an affirmative answer to the above challenge: motivated by Barbarik, but using different techniques and analysis, we design Barbarik2, an algorithm to test whether the distribution generated by a sampler is epsilon-close or eta-far from any target distribution. In contrast to black-box sampling techniques that require a number of samples proportional to |F|, Barbarik2 requires only \tilde{O}(Tilt(W, F)^2/eta(eta - 6*epsilon)^3) samples, where the Tilt is the maximum ratio of weights of two points in F. Barbarik2 can handle any arbitrary weight function. We present a prototype implementation of Barbarik2 and use it to test three state-of-the-art samplers.

name change, sampler, testing, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

Neural Information Processing SystemsDec-23-2025, 22:53:02 GMT

Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions by learning appropriate kernels (or, as a special case, classifiers). Previous work, however, has assumed that many samples are observed from both of the distributions being distinguished. In realistic scenarios with very limited numbers of data samples, it can be challenging to identify a kernel powerful enough to distinguish complex distributions. We address this issue by introducing the problem of meta two-sample testing (M2ST), which aims to exploit (abundant) auxiliary data on related tasks to find an algorithm that can quickly identify a powerful test on new target tasks. We propose two specific algorithms for this task: a generic scheme which improves over baselines, and a more tailored approach which performs even better. We provide both theoretical justification and empirical evidence that our proposed meta-testing schemes outperform learning kernel-based tests directly from scarce observations, and identify when such schemes will be successful.

learning kernel, meta two-sample testing, testing, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

A/B Testing in Dense Large-Scale Networks: Design and Inference

Neural Information Processing SystemsDec-23-2025, 20:07:27 GMT

Design of experiments and estimation of treatment effects in large-scale networks, in the presence of strong interference, is a challenging and important problem. Most existing methods' performance deteriorates as the density of the network increases. In this paper, we present a novel strategy for accurately estimating the causal effects of a class of treatments in a dense large-scale network. First, we design an approximate randomized controlled experiment by solving an optimization problem to allocate treatments in the presence of competition among neighboring nodes. Then we apply an importance sampling adjustment to correct for any leftover bias (from the approximation) in estimating average treatment effects. We provide theoretical guarantees, verify robustness in a simulation study, and validate the scalability and usefulness of our procedure in a real-world experiment on a large social network.

dense large-scale network, design and inference, name change, (6 more...)

Neural Information Processing Systems

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Communications (0.41)
Information Technology > Artificial Intelligence (0.41)

Add feedback

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

Neural Information Processing SystemsDec-23-2025, 20:03:12 GMT

Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity. Recent studies have shown that large language models (LLMs) possess some abstract deductive reasoning ability given chain-of-thought prompts. However, they have primarily been tested on proofs using modus ponens or of a specific size, and from the same distribution as the in-context examples. To measure the general deductive reasoning ability of LLMs, we test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations from multiple angles: depth-, width-, and compositional generalization. To facilitate systematic exploration, we construct a new synthetic and programmable reasoning dataset that enables control over deduction rules and proof complexity. Our experiments on four LLMs of various sizes and training objectives show that they are able to generalize to compositional proofs. However, they have difficulty generalizing to longer proofs, and they require explicit demonstrations to produce hypothetical subproofs, specifically in proof by cases and proof by contradiction.

general deductive reasoning capacity, language model, name change, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Testing for Families of Distributions via the Fourier Transform

Neural Information Processing SystemsNov-20-2025, 22:47:19 GMT

We study the general problem of testing whether an unknown discrete distribution belongs to a specified family of distributions. More specifically, given a distribution family P and sample access to an unknown discrete distribution D, we want to distinguish (with high probability) between the case that D in P and the case that D is ε-far, in total variation distance, from every distribution in P . This is the prototypical hypothesis testing problem that has received significant attention in statistics and, more recently, in computer science. The main contribution of this work is a simple and general testing technique that is applicable to all distribution families whose Fourier spectrum satisfies a certain approximate sparsity property. We apply our Fourier-based framework to obtain near sample-optimal and computationally efficient testers for the following fundamental distribution families: Sums of Independent Integer Random Variables (SIIRVs), Poisson Multinomial Distributions (PMDs), and Discrete Log-Concave Distributions. For the first two, ours are the first non-trivial testers in the literature, vastly generalizing previous work on testing Poisson Binomial Distributions. For the third, our tester improves on prior work in both sample and time complexity.

distribution, fourier transform, name change, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)
Information Technology > Data Science > Data Quality > Data Transformation (0.43)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models

Neural Information Processing SystemsMay-27-2025, 16:43:33 GMT

The predominant de facto paradigm of testing ML models relies on either using only held-out data to compute aggregate evaluation metrics or by assessing the performance on different subgroups. However, such data-only testing methods operate under the restrictive assumption that the available empirical data is the sole input for testing ML models, disregarding valuable contextual information that could guide model testing. In this paper, we challenge the go-to approach of data-only testing and introduce Context-Aware Testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures. We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures, which are evaluated on data using a self-falsification mechanism. Through empirical evaluations in diverse settings, we show that SMART automatically identifies more relevant and impactful failures than alternatives, demonstrating the potential of CAT as a testing paradigm.

context-aware testing, language model, testing, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.40)
Information Technology > Artificial Intelligence > Cognitive Science > Creativity & Intelligence (0.40)

Add feedback

Reviews: Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity

Neural Information Processing SystemsJan-20-2025, 23:04:29 GMT

The goal of improving testing for differences between graphs is clearly relevant to neuroimaging and other application domains. While the specifics are somewhat incremental, I think this is a great idea, and reasonably well executed. Major issues: * Please explain specifically which gradients are used to get from (4) to (5). This derivation seems incorrect if one takes separate derivatives with respect to \beta_1 and \beta_2. How do you end up with a sum of terms (and not two separate terms)?

application, brain connectivity, gaussian graphical model, (9 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.52)

Add feedback

The Humans.ai Testnet is Live🚀. AIding humanity to benefit from the…

#artificialintelligenceDec-13-2022, 16:10:18 GMT

We're excited to announce that the Humans.ai Gravity Testnet has been officially released to the public, an important step towards developing the Blockchain for AIs, scheduled to be launched in 2023. Blockchain of AIs is the first blockchain network from the Cosmos ecosystem capable of managing, deploying and executing artificial intelligence on the blockchain. If you want to get involved in shaping the AI of the future, here's how you can help docs.humans.zone Gravity Testnet will continue to exist once the Anima Mundi Mainnet goes live, and will be primarily used by developers to test AI applications, making sure that everything runs at the highest standards.

artificial intelligence, human, testnet, (8 more...)

#artificialintelligence

Industry: Banking & Finance > Trading (0.39)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence (0.76)

Add feedback

Multi-stage heterogeneous ensemble meta-learning with hands-off demo

#artificialintelligenceNov-25-2020, 18:15:14 GMT

More demo examples can be found in the Demo.R file, to see the results run Rscript Demo.R in the terminal. If there is some implementation you would like to see here or add in some examples feel free to do so.

metaensembler, prediction, return ensembler, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Reinforcement Learning Competitions

AI MagazineJan-4-2018, 13:52:37 GMT

In these events, researchers from around the world developed reinforcement learning agents to compete in domains of various complexity and difficulty. We focus on the 2008 competition, which employed fundamentally redesigned evaluation frameworks that aimed systematically to encourage the submission of robust learning methods. We describe the unique challenges of empirical evaluation in reinforcement learning and briefly review the history of the previous competitions and the evaluation frameworks they employed. We describe the novel frameworks developed for the 2008 competition as well as the software infrastructure on which they rely. Furthermore, we describe the six competition domains, present selected competition results, and discuss the implications of these results.

competition, information technology software, it software, (18 more...)

AI Magazine

Genre: Contests & Prizes (1.00)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Software (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback