Using machine learning for detecting security vulnerabilities through bug report analysis

Wanderley, Caio Walter Deziderio

Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/98208

Title:	Using machine learning for detecting security vulnerabilities through bug report analysis
Other Titles:	Usar machine learning para detectar vulnerabilidades de segurança por meio de análise de relatório de bug
Authors:	Wanderley, Caio Walter Deziderio
Orientador:	Laranjeiro, Carlos Nuno Bizarro e Silva Teixeira, César Alexandre Domingues
Keywords:	Classificação de vulnerabilidades; Problemas de Segurança; Classificação de Bug de Segurança; Relatório de Bug de Segurança; Algoritmos de Machine Learning; Vulnerability Classification; Security Issues; Security Bug classification; Security Bug Reports; Machine learning
Issue Date:	17-Nov-2021
Serial title, monograph or event:	Using machine learning for detecting security vulnerabilities through bug report analysis
Place of publication or event:	DEI- FCTUC
Abstract:	A bug report is the description of an error in the program that was encountered by a developer. With the increasing amount of complexity in software systems, they are prone to have several bugs including those that could reveal sensitive information or allow for attackers to run malicious software. This is especially true for banks and large scale companies, in which a security bug that reveals the users’ credentials or leaves their platform vulnerable to malicious attacks, could affect human lives. All modern large scale companies that have to create a software project have encountered bugs and have to fill in a bug report, and it is the job of a triagger to classify it according to its description. This task done by humans is very time-consuming and prone to a lot of error if the triagger does not know certain areas the report could be mentioned, this can lead to erroneous classification. So in recent decades, several studies are implementing text classification to classify these reports. This thesis, it is first presented an analysis regarding the different approaches in the literature of the past decade for classifying Bug Reports. After the analysis of the literature, we experimented with different combinations of machine learning algorithms to determine the different impacts in the performance when dealing with vulnerabilities classification. We found that the results for the Area Under The Curve (AUC) being 81.21%±8.33 when using Title and Description of a bug report, 79.51%±7.96 when using the Title, and 78.04%±8.2 when using the Description. Um relatório de bug é a descrição de um erro no programa que foi encontrado por um desenvolvedor. Com a crescente complexidade dos sistemas de software, eles estão propensos a ter vários bugs, incluindo aqueles que podem revelar informações confidenciais ou permitir que invasores executem software malicioso. Isso é especialmente verdadeiro para bancos e empresas de grande porte, em que um bug de segurança que revela as credenciais dos usuários ou deixar uma plataforma vulnerável para ataques maliciosos, podem afetar vidas.Todas as empresas de grande porte modernas que têm que criar um projeto de software encontraram bugs e tirão que preencher um relatório de bug, e é o trabalho de um triager para classificá-lo de acordo com sua descrição. Essa tarefa feita por humanos é muito demorada e sujeita a muitos erros se o triagger não tiver o conhecimento de certas áreas que o relatório poderá estar mencionando, isso pode levar a uma classificação incorreta. Por causa disso, nas últimas décadas, vários estudos estão implementando a classificação de texto para classificar esses relatórios.Nesta tese, é apresentada inicialmente uma análise a respeito das diferentes abordagens na literatura das última década para classificar relatórios de bugs. Após a análise da literatura, nós experimentamos diferentes combinações de algoritmo de machine learnig para determinar os diferentes impactos no desempenho ao lidar com a classificação de vulnerabilidades. Foi descoberto que os melhores resultado para o AUC sendo 81.21 %±8.33 ao usar o Título e a Descrição de um relatório de bug, 79.51 %±7.96 ao usar o Título e 78.04 %±8.2 ao usar a Descrição.
Description:	Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia
URI:	https://hdl.handle.net/10316/98208
Rights:	openAccess
Appears in Collections:	UC - Dissertações de Mestrado

Files in This Item:

File	Description	Size	Format
_MSc_Caio__vulnerability_classification.pdf		1.43 MB	Adobe PDF	View/Open

Show full item record

Page view(s)

69

checked on Jul 23, 2024

Download(s)

113

checked on Jul 23, 2024

Google Scholar^TM

Check

This item is licensed under a Creative Commons License

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM