UTHM Institutional Repository

A new genetic algorithm based clustering for binary and imbalanced class data sets

Saharan, Sabariah (2016) A new genetic algorithm based clustering for binary and imbalanced class data sets. PhD thesis, University of Canterbury.


Download (451kB)


This research was initially driven by the lack of clustering algorithms that specifically focus on binary data. To overcome this gap in knowledge, a promising technique for analysing this type of data became the main subject in this research, namely Genetic Algorithm (GA). This type of algorithm has an intrinsic search parallelism that avoids getting stuck at the local optima and poor initialization. For the purpose of this research, GA was combined with the Incremental K-means (IKM) algorithm to cluster the binary data streams. However, prior to this proposed method, a well-known GA based clustering method, GCUK was applied to gauge the performance of this algorithm to cluster the binary data, with new application for binary data set. Subsequently, this led to a proposed new method known as Genetic Algorithm-Incremental K-means (GAIKM) with the objective function based on a few suffi- cient statistics that may be easily and quickly calculated on binary numbers. Different from the other clustering algorithms for binary data, this proposed method has an advantage in terms of fast convergence by implementing the IKM. Additionally, the utilization of GA pro- vides a continuous process of searching for the best solutions, that can escape from being trapped at the local optima like the other clustering methods. The results show that GAIKM is an efficient and effective new clustering algorithm compared to the clustering algorithms and to the IKM itself. The other main contribution in this research is the ability of the pro- posed GAIKM to cluster imbalanced data sets, where standard clustering algorithms cannot simply be applied to this data as they could cause misclassification results. In conclusion, the GAIKM outperformed other clustering algorithms, and paves the way for future research in missing data and outliers and also by implementing the GA multi-objective optimization.

Item Type: Thesis (PhD)
Subjects: Q Science > QA Mathematics
Depositing User: Mr. Mohammad Shaifulrip Ithnin
Date Deposited: 13 Aug 2018 03:27
Last Modified: 13 Aug 2018 03:27
URI: http://eprints.uthm.edu.my/id/eprint/10258
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item


Downloads per month over past year