Please use this identifier to cite or link to this item:
https://hdl.handle.net/10316/93821
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Silva, Paulo | - |
dc.contributor.author | Goncalves, Carolina | - |
dc.contributor.author | Godinho, Carolina | - |
dc.contributor.author | Antunes, Nuno | - |
dc.contributor.author | Curado, Marília | - |
dc.date.accessioned | 2021-03-20T09:54:54Z | - |
dc.date.available | 2021-03-20T09:54:54Z | - |
dc.date.issued | 2020 | - |
dc.identifier.isbn | 978-1-7281-8695-5 | - |
dc.identifier.issn | 978-1-7281-8695-5 (eISSN) | - |
dc.identifier.issn | 978-1-7281-8696-2 | - |
dc.identifier.uri | https://hdl.handle.net/10316/93821 | - |
dc.description.abstract | Privacy concerns are constantly increasing in different sectors. Regulations such as the EU's General Data Protection Regulation (GDPR) are pressuring organizations to handle the individual's data with reinforced caution. As information systems deal with increasingly large amounts of personal data in essential services, there is a lack of mechanisms to help organizations in protecting the involved data subjects. In this paper, we propose and evaluate the use of Named Entity Recognition as a way to identify, monitor and validate Personally Identifiable Information. In our experiments, we used three of the most well-known Natural Language Processing tools (NLTK, Stanford CoreNLP, and spaCy). First, we assess the effectiveness of the tools with a generic dataset. Then, machine learning models are trained and evaluated with datasets built on data that contain personally identifiable information. The results show that models' performance was highly positive in accurately classifying both generic and more context-specific data. We observe the relationship between the datasets' training size and respective performance and estimate the appropriate size for model training within this context. Furthermore, we discuss how our proposal can effectively act as a Privacy Enhancing Technology as well as the potential risks and associated impacts. | - |
dc.language.iso | eng | - |
dc.publisher | IEEE | - |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/786713/EU/Protection and control of Secured Information by means of a privacy enhanced Dashboard | - |
dc.rights | openAccess | - |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | - |
dc.title | Using NLP and Machine Learning to Detect Data Privacy Violations | - |
dc.type | article | - |
degois.publication.firstPage | 972 | - |
degois.publication.lastPage | 977 | - |
degois.publication.location | Toronto | - |
degois.publication.title | IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) | - |
dc.relation.publisherversion | https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9162683 | - |
dc.peerreviewed | yes | - |
dc.identifier.doi | 10.1109/INFOCOMWKSHPS50562.2020.9162683 | - |
dc.date.embargo | 2021-12-31 | * |
uc.date.periodoEmbargo | 730 | - |
item.fulltext | Com Texto completo | - |
item.openairecristype | http://purl.org/coar/resource_type/c_18cf | - |
item.grantfulltext | open | - |
item.languageiso639-1 | en | - |
item.openairetype | article | - |
item.cerifentitytype | Publications | - |
crisitem.author.researchunit | CISUC - Centre for Informatics and Systems of the University of Coimbra | - |
crisitem.author.parentresearchunit | Faculty of Sciences and Technology | - |
crisitem.author.orcid | 0000-0001-6760-4675 | - |
Appears in Collections: | FCTUC Eng.Informática - Artigos em Revistas Internacionais |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
WORKSHOP_ON_SECURITY_AND_PRIVACY_IN_BIG_DATA__Camera_Ready.pdf | 1.14 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License