Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/97347
DC FieldValueLanguage
dc.contributor.advisorOliveira, Hugo Ricardo Gonçalo-
dc.contributor.advisorAlves, Ana Cristina da Costa Oliveira-
dc.contributor.authorPinto, Alexandre-
dc.date.accessioned2022-01-24T11:09:22Z-
dc.date.available2022-01-24T11:09:22Z-
dc.date.issued2016-09-
dc.identifier.urihttps://hdl.handle.net/10316/97347-
dc.descriptionDissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia da Universidade de Coimbra.pt
dc.description.abstractGiven the overwhelming quantity of messages posted in social networks, in order to to make their utilization more productive, it is imperative to lter out irrelevant information. This work is focused on the automatic classi cation of public social data according to its potential relevance to a general audience, according to journalistic criteria. This means ltering out information that is private, personal, not important or simply irrelevant to the public, improving the the overall quality of the social media information. A range of natural language processing toolkits was rst assessed while performing a set of standard tasks in popular datasets that cover newspaper and social network text. After that, di erent learning models were tested, using linguistic features extracted by some of the previous toolkits. The prediction of journalistic criteria, key in the assessment of relevance, was also explored, using the same features. A new classi er uses the journalist predictions, made by an ensemble of linguistic classi ers, as features to detect relevance. The obtained model achieved a F1 score of 0.82 with an area under the curve(AUC) equal to 0.78.pt
dc.description.abstractDada a grande quantidade de dados publicada em redes sociais, e imperativo ltrar informa c~ao irrelevante. Este trabalho foca-se na detec c~ao autom atica de dados sociais p ublicos de acordo com a sua relev^ancia para a audi^encia em geral. Isto signi ca ltrar informa c~ao que e privada, pessoal, n~ao importante, ou simplesmente irrelevante para o p ublico, melhorando assim a qualidade da informa c~ao. Um conjunto de ferramentas de linguagem em processamento natural e testado em uma s erie de tarefas padr~ao com um conjunto de dados que cobrem conte udo jornal stico e texto social. Para al em disso, diferentes modelos de aprendizagem s~ao testados, usando caracter sticas lingu sticas extra das atrav es de tarefas de processamento de linguagem natural, bem como crit erios jornal sticos. O sistema nal usa as previs~oes jornal sticas, realizadas por um conjunto de classi cadores lingu sticos, como atributos para detectar relev^ancia. O modelo obtido alcan cou um valor de F1 de 0.82 com uma area debaixo da curva(AUC) igual a 0.78.pt
dc.language.isoengpt
dc.rightsembargoedAccesspt
dc.subjectRelevance Assessmentpt
dc.subjectSocial Miningpt
dc.subjectInformation Extractionpt
dc.subjectNatural Language Processingpt
dc.subjectAutomatic Text Classificationpt
dc.subjectDetecção de Relevânciapt
dc.subjectExtracção de Dados Sociaispt
dc.subjectExtracção de Conhecimentopt
dc.subjectProcessamento de Linguagem Naturalpt
dc.subjectClassificação Automática de Textopt
dc.titleClassification of Social Media Posts according to their Relevancept
dc.typemasterThesispt
degois.publication.locationCoimbrapt
dc.date.embargo2022-08-31*
thesis.degree.grantor00500::Universidade de Coimbrapt
thesis.degree.nameMestrado em Engenharia Informáticapt
uc.rechabilitacaoestrangeiranopt
uc.date.periodoEmbargo2190pt
item.grantfulltextopen-
item.cerifentitytypePublications-
item.languageiso639-1en-
item.openairetypemasterThesis-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.fulltextCom Texto completo-
crisitem.advisor.researchunitCISUC - Centre for Informatics and Systems of the University of Coimbra-
crisitem.advisor.researchunitCISUC - Centre for Informatics and Systems of the University of Coimbra-
crisitem.advisor.parentresearchunitFaculty of Sciences and Technology-
crisitem.advisor.parentresearchunitFaculty of Sciences and Technology-
crisitem.advisor.orcid0000-0002-5779-8645-
crisitem.advisor.orcid0000-0002-3692-338X-
Appears in Collections:FCTUC Eng.Informática - Teses de Mestrado
Files in This Item:
File Description SizeFormat
reminds-thesis.pdf1.58 MBAdobe PDFView/Open
Show simple item record

Page view(s)

66
checked on May 7, 2024

Download(s)

35
checked on May 7, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.