A hybrid model for discovering significant patterns in data mining

Abdullah, Zailani (2012) A hybrid model for discovering significant patterns in data mining. Masters thesis, Universiti Tun Hussein Onn Malaysia.

Preview

Text
24p ZAILANI ABDULLAH.pdf
Download (661kB) | Preview

Text (Copyright Declaration)
ZAILANI ABDULLAH COPYRIGHT DECLARATION.pdf
Restricted to Repository staff only
Download (106kB) | Request a copy

Text (Full Text)
ZAILANI ABDULLAH WATERMARK.pdf
Restricted to Registered users only
Download (3MB) | Request a copy

Abstract

A significant pattern mining is one of the most important researches and a major concern in data mining. The significant patterns are very useful since it can reveal a new dimension of knowledge in certain domain applications. There are three categories of significant patterns named frequent patterns, least patterns and significant least patterns. Typically, these patterns may derive from the absolute frequent patterns or mixed up with the least patterns. In market-basket analysis, frequent patterns are considered as significant patterns and already make a lot of contribution. Frequent Pattern Tree (FP-Tree) is one of the famous data structure to deal with batched frequent patterns but it must rely on the original database. For detecting the exceptional occurrences or events that have a high implication such as unanticipated substances that cause air pollution, unexpected degree programs selected by students, unpredictable motorcycle models preferred by customers; the least patterns are very meaningful as compared to the frequent one. However, in this category of patterns, the generation of standard tree data structure may trigger the memory overflow due to the requirement of lowering the minimum support threshold. Furthermore, the classical support-confidence measure has many limitations such as tricky in choosing the right support-confidence value, misleading interpretation based on support-confidence combination and not scalable enough to deal with significant least patterns. Therefore, to overcome these drawbacks, in this thesis we proposed a Hybrid Model for Discovering Significant Patterns (Hy-DSP) which consist of the combination of Efficient Frequent Pattern Mining Model (EFP�M2), Efficient Least Pattern Mining Model (ELP-M2) and Significant Least Pattern Mining Model (SLP-M2). The proposed model is developed using the latest .NET framework and C# as a programming language. Experiments with the UCI datasets showed that the Hy-DSP which consist of DOSTrieIT and LP-Growth* outperformed the benchmarked CanTree and FP-Growth up to 4.13 times (75.78%) v and 10.37 times (90.31%), respectively, thus verify its efficiency. In fact, the number of patterns produce by the models is also less than the standard measures.

Item Type:	Thesis (Masters)
Subjects:	Q Science > QA Mathematics Q Science > QA Mathematics > QA76 Computer software
Divisions:	Faculty of Computer Science and Information Technology > Department of Software Engineering
Depositing User:	Mrs. Sabarina Che Mat
Date Deposited:	31 Oct 2021 04:06
Last Modified:	31 Oct 2021 04:06
URI:	http://eprints.uthm.edu.my/id/eprint/2207

Actions (login required)

View Item