Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/100540
Title: Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
Authors: Antas, João
Silva, Rodrigo Rocha 
Bernardino, Jorge 
Keywords: big data; COVID-19; Data Mining; SQL and NoSQL databases
Issue Date: 2022
Serial title, monograph or event: Computers
Volume: 11
Issue: 2
Abstract: COVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases’ systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and NoSQL database systems using the following metrics: query runtime, memory used, CPU used, and storage size. The databases systems assessed were Microsoft SQL Server, MongoDB, and Cassandra. We also evaluate Data Mining algorithms, including Decision Trees, Random Forest, Naive Bayes, and Logistic Regression using Orange Data Mining software data classification tests. Classification tests were performed using cross-validation in a table with about 3 M records, including COVID-19 exams with patients’ symptoms. The Random Forest algorithm has obtained the best average accuracy, recall, precision, and F1 Score in the COVID-19 predictive model performed in the mining stage. In performance evaluation, MongoDB has presented the best results for almost all tests with a large data volume.
URI: https://hdl.handle.net/10316/100540
ISSN: 2073-431X
DOI: 10.3390/computers11020029
Rights: openAccess
Appears in Collections:I&D CISUC - Artigos em Revistas Internacionais

Files in This Item:
Show full item record

SCOPUSTM   
Citations

6
checked on Apr 15, 2024

WEB OF SCIENCETM
Citations

3
checked on Apr 2, 2024

Page view(s)

66
checked on Apr 16, 2024

Download(s)

46
checked on Apr 16, 2024

Google ScholarTM

Check

Altmetric

Altmetric


This item is licensed under a Creative Commons License Creative Commons