AN INTERACTIVE TOOL FOR EXTRACTING LOW-QUALITY SPREADSHEET TABLES AND CONVERTING INTO RELATIONAL DATABASE

Awad, Arwa and Roushdy, Mohamed and ElGohary, Rania and Moawad, Ibrahim (2021) AN INTERACTIVE TOOL FOR EXTRACTING LOW-QUALITY SPREADSHEET TABLES AND CONVERTING INTO RELATIONAL DATABASE. International Journal of Intelligent Computing and Information Sciences, 21 (1). pp. 1-18. ISSN 2535-1710

[thumbnail of IJICIS_Volume 21_Issue 1_Pages 1-18.pdf] Text
IJICIS_Volume 21_Issue 1_Pages 1-18.pdf - Published Version

Download (1MB)

Abstract

Spreadsheets are contained critical information on various topics and most broadly utilized in numerous spaces. There are a huge amount of spreadsheets clients around the world. As a result of their convenience, support for announcing and portrayal as diagrams and graphs and gives their makers an enormous level of opportunity in encoding their data as it simple to utilize. Tables produce a large amount of spreadsheet data. The expansion in volume and complexity of tables has prompted expanded necessities to preserve this data and reuse it. However, spreadsheets are hard to arrange with other data sources. As a result, it makes data stored in spreadsheets with low-quality.
We exhibited an automated extractor tool that gives the standard client a chance to concentrate on extracted relational tables from spreadsheets without experience in any programming language besides high-quality data extraction. The paper executed novel algorithms based on a heuristic approach for table extraction from a spreadsheet and implemented data improvement and quality rules using domain ontology for changing over between low-quality semi-structured data to high-quality relational data for reusability and integration as a Java program interfacing with SQL server database. The paper does experiments on 2 real public datasets. The percentage of improving the performance using the proposed approach on the 2 datasets are 100 % for extracting duplicated records and the percentage of successfully table identified are 100% and 85% respectively.

Item Type: Article
Subjects: Librbary Digital > Computer Science
Depositing User: Unnamed user with email support@librbarydigit.com
Date Deposited: 28 Jun 2023 05:20
Last Modified: 05 Jun 2024 10:32
URI: http://info.openarchivelibrary.com/id/eprint/1054

Actions (login required)

View Item
View Item