UTHM Institutional Repository

A hybrid semantic search technique for web information retrieval

Abdullah, Noryusliza (2015) A hybrid semantic search technique for web information retrieval. PhD thesis, Universiti Tun Hussein Onn Malaysia.


Download (4MB)


Vast emergence of data on the web is an advantage in terms of availability. However, the ever-increasing growth of data and information makes finding the right information a challenge and an urgent task. This scenario results in the need to the improvement of information retrieval (IR). Web Information Retrieval (WIR) is the search engine has become the main resource in this area. Current WIR techniques have assisted in many ways, such as results ranking, categorization, and semantic searching. Nevertheless, there is a need to improve the current techniques to enhance information relevancy based on user's expectations. Therefore, in order to achieve the goals, a hybrid technique combining Categorization, Ontology, and User Prof ling concepts is proposed in this research through the use of Semantic Web (SW) technologies. The objectives of this research were to design, implement and compare an alternative semantic search IR, and its effectiveness is tested in Cloud Computing (CC) environment. The WordNet, a lexical ontology resource, was used for keyword categorization as it consisted of large data in the English language, while the UTHM Ontology (UTHM Onto) supported User Profiling. The similarity between WordNet and UTHM Onto is generated using the semantic similarity measurement. The comparisons between the proposed Hybrid Search Engine (Hysse) with other techniques were identified based on Precision Effectiveness Metric. The term Java (referring to either a programme, beverage or an island) is used to measure the precision. The MAP of Java Object Oriented Programming Language for Hysse is 93%, WSP 89%, Doctopush 7%, Carrot2 73% and Google 93%. On the other hand, MAP of Java Beverage for Hysse is 81%, WSP 76%, Doctopush 9%, Carrot2 4% and Google 6%. Lastly MAP of Java Island for Hysse is 85%, WSP 82%, Doctopush 83%, Carrot2 3% and Google 11%. The Hysse is tested in CC using MYRENCloud and Amazon Elastic Compute Cloud (EC2). Comparison of Hysse and another technique which is Doctopush in cloud shows good results with the difference between them is only 14ms.

Item Type: Thesis (PhD)
Subjects: Q Science > QA Mathematics > QA76 Computer software
Depositing User: En. Sharul Ahmad
Date Deposited: 11 Apr 2016 03:42
Last Modified: 11 Apr 2016 03:42
URI: http://eprints.uthm.edu.my/id/eprint/7898
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item


Downloads per month over past year