Improved adaptive semi-unsupervised weighted oversampling (IA-SUWO) using sparsity factor for imbalanced datasets

Ali, Haseeb (2019) Improved adaptive semi-unsupervised weighted oversampling (IA-SUWO) using sparsity factor for imbalanced datasets. Masters thesis, Universiti Tun Hussein Onn Malaysia.

24p HASEEB ALI.pdf

Download (1MB) | Preview
[img] Text (Copyright Declaration)
Restricted to Repository staff only

Download (1MB) | Request a copy
[img] Text (Full Text)
Restricted to Registered users only

Download (2MB) | Request a copy


The imbalanced data problem is common in data mining nowadays due to the skewed nature of data, which impact the classification process negatively in machine learning. For preprocessing, oversampling techniques significantly benefitted the imbalanced domain, in which artificial data is generated in minority class to enhance the number of samples and balance the distribution of samples in both classes. However, existing oversampling techniques encounter through overfitting and over-generalization problems which lessen the classifier performance. Although many clustering based oversampling techniques significantly overcome these problems but most of these techniques are not able to produce the appropriate number of synthetic samples in minority clusters. This study proposed an improved Adaptive Semi-unsupervised Weighted Oversampling (IA-SUWO) technique, using the sparsity factor which determine the sparse minority samples in each minority cluster. This technique consider the sparse minority samples which are far from the decision boundary. These samples also carry the important information for learning of minority class, if these samples are also considered for oversampling, imbalance ratio will be more reduce also it could enhance the learnability of the classifiers. The outcomes of the proposed approach have been compared with existing oversampling techniques such as SMOTE, Borderline-SMOTE, Safe-level SMOTE, and standard A-SUWO technique in terms of accuracy. As aforementioned, the comparative analysis revealed that the proposed oversampling approach performance increased in average by 5% from 85% to 90% than the existing comparative techniques.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75-76.95 Calculating machines
Divisions: Faculty of Computer Science and Information Technology > Department of Web Technology
Depositing User: Mrs. Sabarina Che Mat
Date Deposited: 25 Jul 2021 07:55
Last Modified: 25 Jul 2021 07:55

Actions (login required)

View Item View Item