Feature-based Model for Extraction and Classification of High Quality Questions in Online Forum

Ojokoh, Bolanle and Igbe, Tobore and Araoye, Ayobami (2017) Feature-based Model for Extraction and Classification of High Quality Questions in Online Forum. British Journal of Mathematics & Computer Science, 22 (1). pp. 1-21. ISSN 22310851

[thumbnail of Ojokoh2212017BJMCS32541.pdf] Text
Ojokoh2212017BJMCS32541.pdf - Published Version

Download (690kB)

Abstract

Aims: To design and implement a classification-based model using specific features for identification and extraction of high quality questions in a thread.

Study Design: The study design is divided into three modules: preprocessing, configuration, and question classification

Place and Duration of Study: Department of Computer Science of the Federal University of Technology Akure, between June 2016 and December 2016

Methodology: This research proposes a way of identifying, extracting and classifying questions in order to enhance high quality answers in an online forum. One of the major issues in question extraction and classification in forum is the restriction on the number of categories considered such as Who, What, Where, Where, Which, Why and How which are not sufficient to capture all possible questions. In this work, a number of parameters were proposed and aggregated using fuzzy logic for context based spam detection and removal in order to enhance question identification and classification. Part of speech (POS) tagging was applied to analyse the structure of each extracted sentence based on the presence and position of predefined question tags; with this, issues like case sensitivity, grammatical construction and synonyms are addressed. Question classification is carried out with Naïve Bayes and identifying semantic relationship between extracted questions is achieved with cosine similarity model. Experiments were performed on dataset constructed from Research Gate website.

Results: We presented questions extracted from researchgate website into the system. The output consists of the corresponding POS tags and the category the question is classified into. The number of questions extracted from the website is dependent on the number of questions available in a forum. We were able to achieve a successful result of 3015 correctly extracted and classified questions at 80% POS tag occurrence.

Conclusion: Our approach to question identification and classification was effective and covers more question categories. This can be applied to any question answering system.

Item Type: Article
Subjects: Librbary Digital > Computer Science
Depositing User: Unnamed user with email support@librbarydigit.com
Date Deposited: 10 May 2023 11:40
Last Modified: 12 Sep 2024 04:56
URI: http://info.openarchivelibrary.com/id/eprint/631

Actions (login required)

View Item
View Item