UTHM Institutional Repository

Techniques for handling imbalanced datasets when producing classifier models

Yusof, Rozianiwati and Kasmiran, Khairul Azhar and Mustapha, Aida and Mustapha, Norwati and Mohd Zin, Nor Asma (2017) Techniques for handling imbalanced datasets when producing classifier models. Journal of Theoretical and Applied Information Technology, 95 (7). pp. 1425-1440. ISSN 18173195

Full text not available from this repository.


Imbalanced datasets are a well-known problem in data mining, where the datasets are composed of two classes; the majority class and minority class. A majority class has more instances compared to the minority class. Recent years have brought increased interest in handling imbalanced datasets since many datasets produced are naturally imbalanced. Most existing techniques for classifying data ignore the imbalanced condition, but focused on the accuracy of the model produced where it is biased to the majority class while giving poor accuracy towards the minority class. Although the minority class is something that rarely happens, but in some conditions it will give an important influence to the classifier model. This paper attempts to list all the techniques in handling imbalanced datasets, as well as to compare all the techniques for producing the best classifier model for imbalanced datasets. These techniques have been categorized into sampling, feature selection and algorithmic approaches in the form of a taxonomy for handling imbalanced datasets. The strengths and the weaknesses of these approaches will be discussed in order to identify an appropriate technique that will improve the performance of a classifier model produced. The recent trends in handling imbalanced datasets also will be discussed based on domain and problems exist in dataset.

Item Type: Article
Uncontrolled Keywords: Imbalanced data; sampling; feature selection; cost sensitive learning; classification
Subjects: Q Science > QA Mathematics > QA76 Computer software
Divisions: Faculty of Computer Science and Information Technology > Department of Software Engineering
Depositing User: Mr. Mohammad Shaifulrip Ithnin
Date Deposited: 13 Aug 2018 03:14
Last Modified: 13 Aug 2018 03:14
URI: http://eprints.uthm.edu.my/id/eprint/9367
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item