基于关联规则与相似度的数据挖掘算法研究

Research on the Data Mining Algorithm Based on Association Rules and Similarity

  • 摘要: 针对使用传统关联规则算法挖掘大数据集时,挖掘过程中效率不高,挖掘出大量冗余规则的问题,提出了基于关联规则和相似度的数据挖掘算法(U-APR):首先,一次性读入数据并构建矩阵,并利用关联规则支持度度量的特性来增加判断属性,以加快结束迭代过程,从而改进了Apriori算法频繁扫描数据库问题;然后,使用相似度算法去除冗余的关联规则;最后,结合置信度、支持度和用户目标匹配度对挖掘结果进行排序输出,从而得到用户感兴趣的关联规则. 同时,应用该算法与目前常用的2种关联规则算法对广东某高校学生财务数据进行数据挖掘. 实验结果表明:与2种常用的关联规则算法相比,U-APR算法缩短了运算时间和提高了存储空间利用率,对用户分析挖掘结果有较好的优化效果.

     

    Abstract: In order to solve the demerits of slow computational efficiency in the process of mining large data sets with the conventional association rule method and mining a large number of redundant rules, a new data mining algorithm based on association rules and similarity (U-APR) is proposed. Firstly, the algorithm reads the data and constructs the matrix at one time, and uses the characteristics of association rule supporting measurement to add judgment attributes and speed up the end of the iterative process, thereby overcoming the problem of frequently scanning the database in the classical Apriori algorithm. Then, it uses the similarity algorithm to delete redundant association rules. Finally, combined with confidence, support and user goal matching, the mining results are sorted and output, so as to obtain the association rules that users are interested in. At the same time, the algorithm and two common association rule methods are used to mine the financial data of students in a university in Guangdong. The experimental results show that compared with the two association rule methods, the U-APR algorithm shortens the operation time and improves the utilization of storage space, exhibiting an optimization effect on the analysis and mining results of users.

     

/

返回文章
返回