Mohd Hanifa, Rafizah (2022) Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification. Doctoral thesis, Universiti Tun Hussein Onn Malaysia.
Text
24p RAFIZAH MOHD HANIFA.pdf Download (928kB) |
|
Text (Copyright Declaration)
RAFIZAH MOHD HANIFA COPYRIGHT DECLARATION.pdf Restricted to Repository staff only Download (323kB) | Request a copy |
|
Text (Full Text)
RAFIZAH MOHD HANIFA WATERMARK.pdf Restricted to Registered users only Download (34MB) | Request a copy |
Abstract
Malaysia is a multi-racial country consisting of many ethnic groups such as the Malay, Chinese, Indian, and Bumiputera, also known as a multilingual society. The Malay language is a non-tonal language, which does not need lexical stress. The study on recognizing the speaker's ethnicity is important as it has many potential and useful applications such as improving the interaction between robots and humans, audio forensic, telephone banking, and electronic commerce. Feature extraction, voice text-independent, and variability coverage are issues related to speaker recognition systems. The research focused on establishing a novel method, Gammatone Frequency Cepstral Coefficients and pitch (GFFCP) coupled with the K-Nearest Neighbours (KNN) and the voice text-independent system were used to identify the speaker's ethnicity. The speech corpus consisted of a collection of readings of Malay texts by both genders with ages ranging from 10 to 48 years old and classified into three ethnic groups: Malay, Chinese, and Indian. GFCC and Mel Frequency Cepstral Coefficients (MFCC) were used to represent the human auditory system. Pitch was added to MFCC and GFCC, as it contributes to the differences in the human voice and is difficult to imitate. The use of Naïve Bayes, Support Vector Machine (SVM), and KNN as classifiers was to quantify the pattern classification performance. The dataset used the hold-out validation methods (80% training, 20% testing) to split the data for training and testing. The system's performance was assessed based on the validation and prediction accuracy. The results revealed that the GFCCP obtained the highest validation and prediction accuracy from the KNN classifier. The validation accuracy was 100%, 99.6%, and 99.2% for 12, 24, and 34 speakers, respectively, while the prediction accuracy was 89.98%, 73.56%, and 72.36% for 12, 24, and 34 speakers, respectively. An important finding in the study is that the combination of the pitch with MFCC and GFCC provided better accuracy, with the latter performing better than the former, compared with those of MFCC and GFCC alone under noisy conditions.
Item Type: | Thesis (Doctoral) |
---|---|
Subjects: | T Technology > T Technology (General) |
Divisions: | Faculty of Electrical and Electronic Engineering > Department of Electrical Engineering |
Depositing User: | Mrs. Sabarina Che Mat |
Date Deposited: | 13 May 2024 06:56 |
Last Modified: | 13 May 2024 06:56 |
URI: | http://eprints.uthm.edu.my/id/eprint/10809 |
Actions (login required)
View Item |