Flexible Pattern Discovery and Analysis

Chen, Chien-Ming, Chen, Lili, Gan, Wensheng

arXiv.org Artificial Intelligence 

--Based on the analysis of the proportion of utility in the supporting transactions used in the field of data mining, high utility-occupancy pattern mining (HUOPM) has recently attracted widespread attention. Unlike high-utility pattern mining (HUPM), which involves the enumeration of high-utility (e.g., profitable) patterns, HUOPM aims to find patterns representing a collection of existing transactions. In practical applications, however, not all patterns are used or valuable. For example, a pattern might contain too many items, that is, the pattern might be too specific and therefore lack value for users in real life. T o achieve qualified patterns with a flexible length, we constrain the minimum and maximum lengths during the mining process and introduce a novel algorithm for the mining of flexible high utility-occupancy patterns. In addition, a utility-occupancy nested list (UO-nlist) and a frequency-utility-occupancy table (FUO-table) are employed to avoid multiple scans of the database. Evaluation results of the subsequent experiments confirm that the proposed algorithm can effectively control the length of the derived patterns, for both real-world and synthetic datasets. Moreover, it can decrease the execution time and memory consumption. HE initial motivation for frequent pattern mining (FPM) was to analyze the shopping behavior of customers using transactional databases and recommend frequently purchased patterns to customers [1], [2], [3], [4], [5]. In this case, researchers believed that the item is binary and whether an item appears in a transaction is considered primary. However, frequent purchase patterns are occasionally less profitable than infrequent purchase patterns with high profits, which poses a fundamental problem. Hence, the discovery of high-utility patterns that consider not only the internal utility (e.g., quantity) but also the external utility (e.g., profit, interest, or weight) [6], [7], [8], [9] has gained substantial research attention. Moreover, a framework called high-utility pattern mining (HUPM) [10], [11] was proposed to address this practical issue. In contrast with frequent pattern mining (FPM), the lack of a downward closure property makes HUPM more difficult and intractable. This research was partially supported by National Natural Science Foundation of China (Grant No. 62002136), Guangzhou Basic and Applied Basic Research Foundation (Grant No. 202102020277). Wensheng Gan is with the College of Cyber Security, Jinan University, Guangzhou 510632, China.