AITopics | scan test

Collaborating Authors

scan test

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Detection of local geometry in random graphs: information-theoretic and computational limits

Bok, Jinho, Li, Shuangping, Yu, Sophie H.

arXiv.org Machine LearningMar-26-2026

We study the problem of detecting local geometry in random graphs. We introduce a model $\mathcal{G}(n, p, d, k)$, where a hidden community of average size $k$ has edges drawn as a random geometric graph on $\mathbb{S}^{d-1}$, while all remaining edges follow the Erdős--Rényi model $\mathcal{G}(n, p)$. The random geometric graph is generated by thresholding inner products of latent vectors on $\mathbb{S}^{d-1}$, with each edge having marginal probability equal to $p$. This implies that $\mathcal{G}(n, p, d, k)$ and $\mathcal{G}(n, p)$ are indistinguishable at the level of the marginals, and the signal lies entirely in the edge dependencies induced by the local geometry. We investigate both the information-theoretic and computational limits of detection. On the information-theoretic side, our upper bounds follow from three tests based on signed triangle counts: a global test, a scan test, and a constrained scan test; our lower bounds follow from two complementary methods: truncated second moment via Wishart--GOE comparison, and tensorization of KL divergence. These results together settle the detection threshold at $d = \widetildeΘ(k^2 \vee k^6/n^3)$ for fixed $p$, and extend the state-of-the-art bounds from the full model (i.e., $k = n$) for vanishing $p$. On the computational side, we identify a computational--statistical gap and provide evidence via the low-degree polynomial framework, as well as the suboptimality of signed cycle counts of length $\ell \geq 4$.

artificial intelligence, inequality, machine learning, (19 more...)

arXiv.org Machine Learning

2603.24545

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)

Genre:

Research Report (0.63)
Overview (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.61)

Add feedback

Phase transition for detecting a small community in a large network

Jin, Jiashun, Ke, Zheng Tracy, Turner, Paxton, Zhang, Anru R.

arXiv.org Artificial IntelligenceMar-8-2023

How to detect a small community in a large network is an interesting problem, including clique detection as a special case, where a naive degree-based $\chi^2$-test was shown to be powerful in the presence of an Erd\H{o}s-Renyi background. Using Sinkhorn's theorem, we show that the signal captured by the $\chi^2$-test may be a modeling artifact, and it may disappear once we replace the Erd\H{o}s-Renyi model by a broader network model. We show that the recent SgnQ test is more appropriate for such a setting. The test is optimal in detecting communities with sizes comparable to the whole network, but has never been studied for our setting, which is substantially different and more challenging. Using a degree-corrected block model (DCBM), we establish phase transitions of this testing problem concerning the size of the small community and the edge densities in small and large communities. When the size of the small community is larger than $\sqrt{n}$, the SgnQ test is optimal for it attains the computational lower bound (CLB), the information lower bound for methods allowing polynomial computation time. When the size of the small community is smaller than $\sqrt{n}$, we establish the parameter regime where the SgnQ test has full power and make some conjectures of the CLB. We also study the classical information lower bound (LB) and show that there is always a gap between the CLB and LB in our range of interest.

artificial intelligence, dist, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.05024

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.92)
Information Technology > Communications > Networks (0.90)

Add feedback

Score-Based Change Detection for Gradient-Based Learning Machines

Liu, Lang, Salmon, Joseph, Harchaoui, Zaid

arXiv.org Machine LearningJun-26-2021

The widespread use of machine learning algorithms calls for automatic change detection algorithms to monitor their behavior over time. As a machine learning algorithm learns from a continuous, possibly evolving, stream of data, it is desirable and often critical to supplement it with a companion change detection algorithm to facilitate its monitoring and control. We present a generic score-based change detection method that can detect a change in any number of components of a machine learning model trained via empirical risk minimization. This proposed statistical hypothesis test can be readily implemented for such models designed within a differentiable programming framework. We establish the consistency of the hypothesis test and show how to calibrate it to achieve a prescribed false alarm rate. We illustrate the versatility of the approach on synthetic and real data.

hypothesis, lin, statistic, (14 more...)

arXiv.org Machine Learning

2106.14122

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

A practical test for a planted community in heterogeneous networks

Yuan, Mingao, Wen, Qian

arXiv.org Machine LearningJan-14-2021

One of the fundamental task in graph data mining is to find a planted community(dense subgraph), which has wide application in biology, finance, spam detection and so on. For a real network data, the existence of a dense subgraph is generally unknown. Statistical tests have been devised to testing the existence of dense subgraph in a homogeneous random graph. However, many networks present extreme heterogeneity, that is, the degrees of nodes or vertexes don't concentrate on a typical value. The existing tests designed for homogeneous random graph are not straightforwardly applicable to the heterogeneous case. Recently, scan test was proposed for detecting a dense subgraph in heterogeneous(inhomogeneous) graph(\cite{BCHV19}). However, the computational complexity of the scan test is generally not polynomial in the graph size, which makes the test impractical for large or moderate networks. In this paper, we propose a polynomial-time test that has the standard normal distribution as the null limiting distribution. The power of the test is theoretically investigated and we evaluate the performance of the test by simulation and real data example.

dense subgraph, random graph, theorem 2, (12 more...)

arXiv.org Machine Learning

2101.05928

Country:

North America > United States > North Dakota > Cass County > Fargo (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.55)
Information Technology > Data Science > Data Mining (0.48)

Add feedback

Community Detection in Sparse Random Networks

Arias-Castro, Ery, Verzelen, Nicolas

arXiv.org Machine LearningSep-25-2014

We consider the problem of detecting a tight community in a sparse random network. This is formalized as testing for the existence of a dense random subgraph in a random graph. Under the null hypothesis, the graph is a realization of an Erd\"os-R\'enyi graph on $N$ vertices and with connection probability $p_0$; under the alternative, there is an unknown subgraph on $n$ vertices where the connection probability is p1 > p0. In Arias-Castro and Verzelen (2012), we focused on the asymptotically dense regime where p0 is large enough that np0>(n/N)^{o(1)}. We consider here the asymptotically sparse regime where p0 is small enough that np0<(n/N)^{c0} for some c0>0. As before, we derive information theoretic lower bounds, and also establish the performance of various tests. Compared to our previous work, the arguments for the lower bounds are based on the same technology, but are substantially more technical in the details; also, the methods we study are different: besides a variant of the scan statistic, we study other statistics such as the size of the largest connected component, the number of triangles, the eigengap of the adjacency matrix, etc. Our detection bounds are sharp, except in the Poisson regime where we were not able to fully characterize the constant arising in the bound.

hypothesis, probability, scan test, (15 more...)

arXiv.org Machine Learning

1308.2955

Country:

North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)

Add feedback

Community Detection in Random Networks

Arias-Castro, Ery, Verzelen, Nicolas

arXiv.org Machine LearningFeb-28-2013

In recent years, the problem of detecting communities in networks has received a large amount of attention, with important applications in the social and biological sciences, among others (Fortunato, 2010). The vast majority of this expansive literature focuses on developing realistic models of (random) networks (Albert and Barabási, 2002; Barabási and Albert, 1999), on designing methods for extracting communities from such networks (Girvan and Newman, 2002; Newman, 2006; Reichardt and Bornholdt, 2006) and on fitting models to network data (Bickel et al., 2011). The underlying model is that of graph G (E,V), where E is the set of edges and V is the set of nodes. For example, in a social network, a node would represent an individual and an edge between two nodes would symbolize a friendship or kinship of some sort shared by these two individuals. In the literature just mentioned, almost all the methodology has concentrated on devising graph partitioning methods, with the end goal of clustering the nodes in V into groups with strong inner-connectivity and weak inter-connectivity (Bickel and Chen, 2009; Lancichinetti and Fortunato, 2009; Newman and Girvan, 2004).

artificial intelligence, data mining, probability, (18 more...)

arXiv.org Machine Learning

1302.7099

Country:

North America > United States (0.67)
Europe (0.46)

Genre: Research Report (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.84)

Add feedback