nonparametric test
A Kernel Test for Three-Variable Interactions
We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful three-variable interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
A Kernel Test for Three-Variable Interactions
We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful three-variable interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
Hypothesis Testing for Class-Conditional Noise Using Local Maximum Likelihood
Yang, Weisong, Poyiadzi, Rafael, Twomey, Niall, Rodriguez, Raul Santos
In supervised learning, automatically assessing the quality of the labels before any learning takes place remains an open research question. In certain particular cases, hypothesis testing procedures have been proposed to assess whether a given instance-label dataset is contaminated with class-conditional label noise, as opposed to uniform label noise. The existing theory builds on the asymptotic properties of the Maximum Likelihood Estimate for parametric logistic regression. However, the parametric assumptions on top of which these approaches are constructed are often too strong and unrealistic in practice. To alleviate this problem, in this paper we propose an alternative path by showing how similar procedures can be followed when the underlying model is a product of Local Maximum Likelihood Estimation that leads to more flexible nonparametric logistic regression models, which in turn are less susceptible to model misspecification. This different view allows for wider applicability of the tests by offering users access to a richer model class. Similarly to existing works, we assume we have access to anchor points which are provided by the users. We introduce the necessary ingredients for the adaptation of the hypothesis tests to the case of nonparametric logistic regression and empirically compare against the parametric approach presenting both synthetic and real-world case studies and discussing the advantages and limitations of the proposed approach.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (1.00)
- Information Technology > Smart Houses & Appliances (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
BEAUTY Powered BEAST
Zhang, Kai, Zhao, Zhigen, Zhou, Wen
We study inference about the uniform distribution with the proposed binary expansion approximation of uniformity (BEAUTY) approach. Through an extension of the celebrated Euler's formula, we approximate the characteristic function of any copula distribution with a linear combination of means of binary interactions from marginal binary expansions. This novel characterization enables a unification of many important existing tests through an approximation from some quadratic form of symmetry statistics, where the deterministic weight matrix characterizes the power properties of each test. To achieve a uniformly high power, we study test statistics with data-adaptive weights through an oracle approach, referred to as the binary expansion adaptive symmetry test (BEAST). By utilizing the properties of the binary expansion filtration, we show that the Neyman-Pearson test of uniformity can be approximated by an oracle weighted sum of symmetry statistics. The BEAST with this oracle leads all existing tests we considered in empirical power against all complex forms of alternatives. This oracle therefore sheds light on the potential of substantial improvements in power and on the form of optimal weights under each alternative. By approximating this oracle with data-adaptive weights, we develop the BEAST that improves the empirical power of many existing tests against a wide spectrum of common alternatives while providing clear interpretation of the form of non-uniformity upon rejection. We illustrate the BEAST with a study of the relationship between the location and brightness of stars.
- North America > United States > New York (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- North America > United States > Colorado > Larimer County > Fort Collins (0.04)
A Kernel Test for Three-Variable Interactions
Sejdinovic, Dino, Gretton, Arthur, Bergsma, Wicher
We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful three-variable interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures. Papers published at the Neural Information Processing Systems Conference.
Detecting stationarity in time series data
Stationarity is an important concept in time series analysis. For a concise (but thorough) introduction to the topic, and the reasons that make it important, take a look at my previous blog post on the topic. As such, the ability to determine if a time series is stationary is important. Rather than deciding between two strict options, this usually means being able to ascertain, with high probability, that a series is generated by a stationary process. In this brief post, I will cover several ways to do just that.
- North America > United States (0.06)
- Europe > Netherlands > South Holland > Delft (0.05)
- Oceania > Australia > Victoria (0.05)
- North America > Canada (0.05)
Importance of Hypothesis Testing in Quality Management
Essentially good hypotheses lead decision-makers like you to new and better ways to achieve your business goals. When you need to make decisions such as how much you should spend on advertising or what effect a price increase will have your customer base, it's easy to make wild assumptions or get lost in analysis paralysis. A business hypothesis solves this problem, because, at the start, it's based on some foundational information. In all of science, hypotheses are grounded in theory. Theory tells you what you can generally expect from a certain line of inquiry.
Importance of Hypothesis Testing in Quality Management
Essentially good hypotheses lead decision-makers like you to new and better ways to achieve your business goals. When you need to make decisions such as how much you should spend on advertising or what effect a price increase will have your customer base, it's easy to make wild assumptions or get lost in analysis paralysis. A business hypothesis solves this problem, because, at the start, it's based on some foundational information. In all of science, hypotheses are grounded in theory. Theory tells you what you can generally expect from a certain line of inquiry.
Importance of Hypothesis Testing in Quality Management
Essentially good hypotheses lead decision-makers like you to new and better ways to achieve your business goals. When you need to make decisions such as how much you should spend on advertising or what effect a price increase will have your customer base, it's easy to make wild assumptions or get lost in analysis paralysis. A business hypothesis solves this problem, because, at the start, it's based on some foundational information. In all of science, hypotheses are grounded in theory. Theory tells you what you can generally expect from a certain line of inquiry.
Nonparametric Detection of Geometric Structures over Networks
Zou, Shaofeng, Liang, Yingbin, Poor, H. Vincent
Nonparametric detection of existence of an anomalous structure over a network is investigated. Nodes corresponding to the anomalous structure (if one exists) receive samples generated by a distribution q, which is different from a distribution p generating samples for other nodes. If an anomalous structure does not exist, all nodes receive samples generated by p. It is assumed that the distributions p and q are arbitrary and unknown. The goal is to design statistically consistent tests with probability of errors converging to zero as the network size becomes asymptotically large. Kernel-based tests are proposed based on maximum mean discrepancy that measures the distance between mean embeddings of distributions into a reproducing kernel Hilbert space. Detection of an anomalous interval over a line network is first studied. Sufficient conditions on minimum and maximum sizes of candidate anomalous intervals are characterized in order to guarantee the proposed test to be consistent. It is also shown that certain necessary conditions must hold to guarantee any test to be universally consistent. Comparison of sufficient and necessary conditions yields that the proposed test is order-level optimal and nearly optimal respectively in terms of minimum and maximum sizes of candidate anomalous intervals. Generalization of the results to other networks is further developed. Numerical results are provided to demonstrate the performance of the proposed tests.
- North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
- North America > United States > New York > Onondaga County > Syracuse (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)