AITopics | non-parametric test

Collaborating Authors

non-parametric test

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Statistical Hypothesis Testing for Information Value (IV)

Rojas, Helder, Alvarez, Cirilo, Rojas, Nilton

arXiv.org Machine LearningSep-29-2023

Information value (IV) is a quite popular technique for features selection before the modeling phase. There are practical criteria, based on fixed thresholds for IV, but at the same time mysterious and lacking theoretical arguments, to decide if a predictor has sufficient predictive power to be considered in the modeling phase. However, the mathematical development and statistical inference methods for this technique are almost nonexistent in the literature. In this paper we present a theoretical framework for IV, and at the same time, we propose a non-parametric hypothesis test to evaluate the predictive power of features contemplated in a data set. Due to its relationship with divergence measures developed in the Information Theory, we call our proposal the J - Divergence test. We show how to efficiently compute our test statistic and we study its performance on simulated data. In various scenarios, particularly in unbalanced data sets, we show its superiority over conventional criteria based on fixed thresholds. Furthermore, we apply our test on fraud identification data and provide an open-source Python library, called "statistical-iv"(https://pypi.org/project/statistical-iv/), where we implement our main results.

artificial intelligence, imbalance, machine learning, (17 more...)

arXiv.org Machine Learning

2309.13183

Country:

South America > Peru > Lima Department > Lima Province > Lima (0.05)
North America > United States > North Carolina (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)

Genre: Research Report (1.00)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.41)

Add feedback

Benchmarking Non-Parametric Statistical Tests

Neural Information Processing SystemsApr-6-2023, 15:32:51 GMT

Although non-parametric tests have already been proposed for that purpose, statistical significance tests for non-standard measures (different from the classification error) are less often used in the literature. This paper is an attempt at empirically verifying how these tests compare with more classical tests, on various conditions. More precisely, using a very large dataset to estimate the whole "population", we analyzed the behavior of several statistical test, varying the class unbalance, the compared models, the performance measure, and the sample size. The main result is that providing big enough evaluation sets non-parametric tests are relatively reliable in all conditions.

benchmarking non-parametric statistical test, non-parametric test

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)

Add feedback

Parametric vs. Non-parametric tests, and when to use them

#artificialintelligenceApr-23-2022, 03:25:38 GMT

The fundamentals of Data Science include computer science, statistics and math. It's very easy to get caught up in the latest and greatest, most powerful algorithms -- convolutional neural nets, reinforcement learning etc. As an ML/health researcher and algorithm developer, I often employ these techniques. However, something I have seen rife in the data science community after having trained 10 years as an electrical engineer is that if all you have is a hammer, everything looks like a nail. Suffice it to say that while many of these exciting algorithms have immense applicability, too often the statistical underpinnings of the data science community are overlooked.

data science community, non-parametric test, statistics, (6 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

A Non-Parametric Test to Detect Data-Copying in Generative Models

Meehan, Casey, Chaudhuri, Kamalika, Dasgupta, Sanjoy

arXiv.org Machine LearningApr-12-2020

Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call {\em{data-copying}} -- where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample non-parametric test for detecting data-copying that uses the training set, a separate sample from the target distribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets. For code \& examples, visit https://github.com/casey-meehan/data-copying

artificial intelligence, generative model, machine learning, (16 more...)

arXiv.org Machine Learning

2004.05675

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learn R for Applied Statistics - Programmer Books

#artificialintelligenceNov-28-2019, 00:52:27 GMT

Gain the R programming language fundamentals for doing the applied statistics useful for data exploration and analysis in data science and data mining. This book covers topics ranging from R syntax basics, descriptive statistics, and data visualizations to inferential statistics and regressions. After learning R's syntax, you will work through data visualizations such as histograms and boxplot charting, descriptive statistics, and inferential statistics such as t-test, chi-square test, ANOVA, non-parametric test, and linear regressions. Learn R for Applied Statistics is a timely skills-migration book that equips you with the R programming fundamentals and introduces you to applied statistics for data explorations. Those who are interested in data science, in particular data exploration using applied statistics, and the use of R programming for data visualizations.

data visualization, inferential statistics, linear regression, (9 more...)

#artificialintelligence

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Data Science (1.00)
Information Technology > Visualization (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.83)

Add feedback

Comparing Machine Learning Models: Statistical vs. Practical Significance

#artificialintelligenceNov-3-2019, 15:22:49 GMT

A lot of work has been done on building and tuning ML models, but a natural question that eventually comes up after all that hard work is -- how do we actually compare the models we've built? If we're facing a choice between models A and B, which one is the winner and why? Could the models be combined together so that optimal performance is achieved? A very shallow approach would be to compare the overall accuracy on the test set, say, model A's accuracy is 94% vs. model B's accuracy is 95%, and blindly conclude that B won the race. In fact, there is so much more than the overall accuracy to investigate and more facts to consider.

accuracy, hypothesis, statistically significant difference, (14 more...)

#artificialintelligence

Genre: Research Report > Experimental Study (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Detecting stationarity in time series data

#artificialintelligenceAug-25-2019, 20:21:50 GMT

Stationarity is an important concept in time series analysis. For a concise (but thorough) introduction to the topic, and the reasons that make it important, take a look at my previous blog post on the topic. As such, the ability to determine if a time series is stationary is important. Rather than deciding between two strict options, this usually means being able to ascertain, with high probability, that a series is generated by a stationary process. In this brief post, I will cover several ways to do just that.

artificial intelligence, machine learning, stationarity, (14 more...)

#artificialintelligence

Country:

North America (0.16)
Oceania > Australia (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Add feedback

A Guide To Conduct Analysis Using Non-Parametric Statistical Tests

@machinelearnbotNov-5-2017, 12:20:13 GMT

The average salary package of an economics honors graduate at Hansraj College during the end of the 1980s was around INR 1,000,000 p.a. The number is significantly higher than people graduating in early 80s or early 90s. What could be the reason for such a high average? Well, one of the highest paid Indian celebrity, Shahrukh Khan graduated from Hansraj College in 1988 where he was pursuing economics honors. This, and many such examples tell us that average is not a good indicator of the center of the data. It can be extremely influenced by Outliers. In such cases, looking at median is a better choice.

hypothesis, non-parametric test, student, (16 more...)

@machinelearnbot

Genre: Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.47)
Health & Medicine > Therapeutic Area > Immunology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Communications > Social Media (0.40)

Add feedback