interval pattern
Efficiently Sampling Interval Patterns from Numerical Databases
Bekkoucha, Djawad, Diop, Lamine, Ouali, Abdelkader, Crémilleux, Bruno, Boizumault, Patrice
Pattern sampling has emerged as a promising approach for information discovery in large databases, allowing analysts to focus on a manageable subset of patterns. In this approach, patterns are randomly drawn based on an interestingness measure, such as frequency or hyper-volume. This paper presents the first sampling approach designed to handle interval patterns in numerical databases. This approach, named Fips, samples interval patterns proportionally to their frequency. It uses a multi-step sampling procedure and addresses a key challenge in numerical data: accurately determining the number of interval patterns that cover each object. We extend this work with HFips, which samples interval patterns proportionally to both their frequency and hyper-volume. These methods efficiently tackle the well-known long-tail phenomenon in pattern sampling. We formally prove that Fips and HFips sample interval patterns in proportion to their frequency and the product of hyper-volume and frequency, respectively. Through experiments on several databases, we demonstrate the quality of the obtained patterns and their robustness against the long-tail phenomenon.
Closed pattern mining of interval data and distributional data
Soldano, Henry, Santini, Guillaume, Zevio, Stella
We discuss pattern languages for closed pattern mining and learning of interval data and distributional data. We first introduce pattern languages relying on pairs of intersection-based constraints or pairs of inclusion based constraints, or both, applied to intervals. We discuss the encoding of such interval patterns as itemsets thus allowing to use closed itemsets mining and formal concept analysis programs. We experiment these languages on clustering and supervised learning tasks. Then we show how to extend the approach to address distributional data.
Revisiting Numerical Pattern Mining with Formal Concept Analysis
Kaytoue, Mehdi, Kuznetsov, Sergei O., Napoli, Amedeo
In this paper, we investigate the problem of mining numerical data in the framework of Formal Concept Analysis. The usual way is to use a scaling procedure --transforming numerical attributes into binary ones-- leading either to a loss of information or of efficiency, in particular w.r.t. the volume of extracted patterns. By contrast, we propose to directly work on numerical data in a more precise and efficient way, and we prove it. For that, the notions of closed patterns, generators and equivalent classes are revisited in the numerical context. Moreover, two original algorithms are proposed and used in an evaluation involving real-world data, showing the predominance of the present approach.
Revisiting Numerical Pattern Mining with Formal Concept Analysis
Kaytoue, Mehdi (INRIA Nancy Grand Est - LORIA) | Kuznetsov, Sergei O. (Higher School of Economics - State University) | Napoli, Amedeo (CNRS)
We investigate the problem of mining numerical data with Formal Concept Analysis. The usual way is to use a scaling procedure —transforming numerical attributes into binary ones — leading either to a loss of information or of efficiency, in particular w.r.t. the volume of extracted patterns. By contrast, we propose to directly work on numerical data in a more precise and efficient way. For that, the notions of closed patterns, generators and equivalent classes are revisited in the numerical context. Moreover, two original algorithms are proposed and tested in an evaluation involving real-world data, showing the quality of the present approach.