UTHM Institutional Repository

Text categorization based on fuzzy soft set theory

Bana Handaga, Bana Handaga and Mat Deris, Mustafa (2012) Text categorization based on fuzzy soft set theory. In: ICCSA'12: Proceedings of the 12th international conference on Computational Science and Its Applications, 18-21 June 2012, Bahia, Brazil.

Full text not available from this repository.


In this paper, we proposed a new method for Text Categorization based on fuzzy soft set theory so called fuzzy soft set classifier (FSSC). We use fuzzy soft set representation that derived from the bag-of-words representation and define each term as a distinct word in the set of words of the document collection. The FSSC categorize each document by using fuzzy c-means formula for classification, and use fuzzy soft set similarity to measure distance between two documents. We perform the experiments with the standard Reuters-21578 dataset, and using three kind of weigthing such as boolean, term frequency, and term frequency-invert document frequency to compare the performance of FSSC with others four classifier such as kNN, Bayesian, Rocchio, and SVM. We are using precision, recall, F-measure, retun-size, and the running time as a performance evaluation. Result shown that there is no absolute winner. The FSSC has precision, recall, and F-measure lower than SVM, and kNN but FSSC can work faster than both. When compared with the Bayesian and Rocchio, the FSSC works more slowly but has a higher precision and F-measure.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: bag-of-words; fuzzy soft set theory; text classification
Subjects: Q Science > QA Mathematics > QA76 Computer software
Divisions: Faculty of Computer Science and Information Technology > Department of Software Engineering
Depositing User: Normajihan Abd. Rahman
Date Deposited: 15 Apr 2013 05:31
Last Modified: 15 Apr 2013 05:31
URI: http://eprints.uthm.edu.my/id/eprint/3585
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item