NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification

Yazdani, Sepideh Foroozan and Tan, Zhiyuan and Kakavand, Mohsen and Mustapha, Aida (2018) NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification. WIRELESS NETWORKS. pp. 1-11. ISSN 1022-0038

[img] Text
AJ 2018 (843) NgramPOS a bigram-based linguistic and statistical feature process model for unstructured text classification.pdf
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

Research in financial domain has shown that sentiment aspects of stock news have a profound impact on volume trades, volatility, stock prices and firm earnings. In-depth analysis of stock news is now sourced from financial reviews by various social networking and marketing sites to help improve decision making. Nonetheless, such reviews are in the form of unstructured text, which requires natural language processing (NLP) in order to extract the sentiments. Accordingly, in this study we investigate the use of NLP tasks in effort to improve the performance of sentiment classification in evaluating the information content of financial news as an instrument in investment decision support system. At present, feature extraction approach is mainly based on the occurrence frequency of words. Therefore low-frequency linguistic features that could be critical in sentiment classification are typically ignored. In this research, we attempt to improve current sentiment analysis approaches for financial news classification by focusing on low-frequency but informative linguistic expressions. Our proposed combination of low and high-frequency linguistic expressions contributes a novel set of features for sentiment classification. The experimental results show that an optimal Ngram feature selection (combination of optimal unigram and bigram features) enhances sentiment classification accuracy as compared to other types of feature sets.

Item Type: Article
Uncontrolled Keywords: Unstructured text; Bigram model; Machine learning; Natural language processing; Sentiment classification; Financial news analysis
Subjects: Q Science > QA Mathematics > QA76 Computer software
T Technology > TA Engineering (General). Civil engineering (General)
T Technology > TA Engineering (General). Civil engineering (General) > TA329-348 Engineering mathematics. Engineering analysis
Divisions: Faculty of Applied Science and Technology > Department of Mathematics and Statistics
Depositing User: UiTM Student Praktikal
Date Deposited: 06 Jan 2022 02:29
Last Modified: 06 Jan 2022 02:29
URI: http://eprints.uthm.edu.my/id/eprint/5136

Actions (login required)

View Item View Item