Artificial Intelligence and Machine Learning for security and privacy in Cloud-Native environments

Domingues, João Bernardo do Nascimento

Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/107879

Title:	Artificial Intelligence and Machine Learning for security and privacy in Cloud-Native environments
Other Titles:	Inteligência Artificial e Machine Learning para segurança e privacidade em ambientes Cloud-Native
Authors:	Domingues, João Bernardo do Nascimento
Orientador:	Cordeiro, Luis Filipe Vieira Simões, Marco António Machado
Keywords:	Machine Learning (ML); Deteção de anomalias em redes; Segurança; Autoencoders; Unsupervised Learning; Machine Learning (ML); Network Anomaly detection; Security; Autoencoders; Unsupervised Learning
Issue Date:	24-Jul-2023
Serial title, monograph or event:	Artificial Intelligence and Machine Learning for security and privacy in Cloud-Native environments
Place of publication or event:	OneSource, Consultoria Informática, Lda.
Abstract:	A utilização de aplicações Cloud-native tem vindo a crescer nos últimos anos, ainda assim estes têm os seus problemas, um deles a segurança. De maneira a diminuir este problema, foram recentemente propostas e tidas mais em consideração soluções baseadas em Inteligência Artificial para segurança, visto que mostram resultados notáveis na identificação e resposta a ameaças presentes numa rede.Este trabalho tem o objetivo de pesquisar, conceber e construir um modelo de IA para a detecção de anomalias de rede como parte do desenvolvimento de uma Framework de Segurança e Privacidade Holística no contexto dos projectos de investigação financiados pela UE, CHARITY e 5G-EPICENTRE. Esta framework destina-se a automatizar, detectar e mitigar anomalias, tais como ciberataques, em ambientes Cloud-Native.Neste relatório, fornecemos ainda uma análise de vários tópicos de investigação, tais como ambientes Cloud-native, segurança de redes e o estado da arte de Machine Learning para deteção de anomalias em redes, onde apresentamos e discutimos várias abordagens baseadas em Machine Learning e Deep Learning.Utilizando o algoritmo de Supervised Learning, Random Forest, como modelo de referência, que obteve um bom desempenho na deteção de ataques, comparamos o desempenho deste algoritmo com um Conventional Autoencoder e um Convolutional Autoencoder e chegámos assim à conclusão que o Random Forest demonstrou um desempenho pior em comparação com os Autoencoders, o que mostra que as abordagens Unsupervised Learning, que atingiram um F1-Score na ordem dos 80\%, foi melhor do que uma abordagem Supervised Learning, uma vez que esta última não conseguiu identificar adequadamente ataques desconhecidos ao modelo, alcançando um F1-Score de 27\%.O Conventional Autoencoder apresentou um F1-Score acima de 69\% em todos os diferentes tipos de dados, exceto no Email, cuja performance foi má em comparação com os outros tipos de dados. O Convolutional Autoencoder apresentou um F1-Score acima de 59\% em todos os models treinados para cada tipo de dados.Ao comparar o Conventional Autoencoder com o Convolutional Autoencoder, podemos concluir que ambos os algoritmos têm um desempenho muito semelhante, conseguindo valores de AUC semelhantes, assim como tempos de classificação. Sendo que o objetivo da framework era treinar periodicamente o modelo com novos dados, para que ele continuasse a melhorar, o tempo de treino do modelo era um aspeto realmente importante, e, nesse caso, o Convolutional Autoencoder levou muito mais tempo do que o Conventional Autoencoder. The usage of cloud-native applications has been growing in the last few years, despite this they still have some challenges, with security being one of them. In order to diminish this problem, Artificial Intelligence-based security solutions have been recently proposed and taken more into consideration since it shows notable results in identifying and responding to threats released in a network.Network anomaly detection being the process of monitoring network data and detecting abnormal events that may occur, it is the main technique we used in our workThis work has the goal of researching, designing and building an \acrshort{ai} model for network anomaly detection as part of the development of a Holistic Security and Privacy Framework in the context of the \acrshort{charity} and \acrshort{5g-epicentre} EU-funded research projects. This framework is intended to automate, detect and mitigate anomalies, such as cyber-attacks, in cloud-based environments.In this report, we provide a review of several research topics, such as Cloud-based environments, network security and the state of the art of Machine Learning for Network Anomaly Detection, where we present and discuss various different Machine Learning, and Deep Learning approaches. Using the Supervised Learning algorithm, Random Forest, as a baseline model that achieved a good performance in detecting attacks, we compared the performance of the same algorithm with a Conventional and Convolutional Autoencoder in detecting unknown attacks and reached the conclusion that the Random Forest had a worse performance, showing that an Unsupervised Learning approach, achieving an F1-Score in the order of 80\%, was better than a Supervised Learning approach as the latter one cannot identify well enough attacks that were unknown to the model, achieving an F1-Score of 27\%.The Conventional Autoencoder presented an F1-Score value above 69\% in all different types of data except for the Email, which the performance was bad compared to the other types of data. The Convolutional Autoencoder presented and F1-Score value above 59\% in all of the models trained for each type of data.Comparing the Conventional Autoencoder with the Convolutional, we could conclude that both algorithms have a very similar performance overall, achieving similar \acrshort{auc} scores, as well as the classification time. Since the objective of the framework was to periodically train the model we were using with new data, in order for it to keep improving, the training time of the model was a really important aspect, and in this case the Convolutional Autoencoder took much more time than the Conventional Autoencoder.
Description:	Dissertação de Mestrado em Engenharia e Ciência de Dados apresentada à Faculdade de Ciências e Tecnologia
URI:	https://hdl.handle.net/10316/107879
Rights:	openAccess
Appears in Collections:	UC - Dissertações de Mestrado