Alice and the Caterpillar: A more descriptive null model for assessing data mining results
Preti, Giulia, Morales, Gianmarco De Francisci, Riondato, Matteo
–arXiv.org Artificial Intelligence
We introduce novel null models for assessing the results obtained from observed binary transactional and sequence datasets, using statistical hypothesis testing. Our null models maintain more properties of the observed dataset than existing ones. Specifically, they preserve the Bipartite Joint Degree Matrix of the bipartite (multi-)graph corresponding to the dataset, which ensures that the number of caterpillars, i.e., paths of length three, is preserved, in addition to other properties considered by other models. We describe Alice, a suite of Markov chain Monte Carlo algorithms for sampling datasets from our null models, based on a carefully defined set of states and efficient operations to move between them. The results of our experimental evaluation show that Alice mixes fast and scales well, and that our null model finds different significant results than ones previously considered in the literature.
arXiv.org Artificial Intelligence
Jun-12-2025
- Country:
- Africa > Senegal
- Kolda Region > Kolda (0.04)
- Europe
- Italy > Piedmont
- Turin Province > Turin (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Italy > Piedmont
- North America > United States
- California
- Los Angeles County > Los Angeles (0.04)
- San Francisco County > San Francisco (0.14)
- California
- Africa > Senegal
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Technology: