Enhancing Indoor Human Detection: A Comprehensive Study of YOLOv5 Algorithm with Thermal Imagery

Palma, Disa Alexandra Queiroz

Utilize este identificador para referenciar este registo: https://hdl.handle.net/10316/110532

Título:	Enhancing Indoor Human Detection: A Comprehensive Study of YOLOv5 Algorithm with Thermal Imagery
Outros títulos:	Aperfeiçoamento de Deteção de Pessoas em Ambientes Interiores: Um Estudo Abrangente da Aplicação do Algoritmo YOLOv5 em Imagens Térmicas
Autor:	Palma, Disa Alexandra Queiroz
Orientador:	Premebida, Cristiano
Palavras-chave:	Object detection; Transfer Learning; Deep learning; YOLO; Thermal images; Deteção de objectos; Transferência de conhecimento; Aprendizagem profunda; YOLO; Imagens Térmicas
Data:	26-Set-2023
Título da revista, periódico, livro ou evento:	Enhancing Indoor Human Detection: A Comprehensive Study of YOLOv5 Algorithm with Thermal Imagery
Local de edição ou do evento:	DEEC
Resumo:	Object detection has a wide range of applications: it can be used to assist in agriculture, to help detect a mass for cancer diagnosis, enable autonomous driving, robotic perception, or help against home intruders. Object detection has shown significant success with RGB data. However, this type of data does not operate well in poor lighting conditions and demands a substantial amount of storage. One solution could be the use of thermal images, which require less space and are more adaptable to varying luminosity conditions. Therefore, this thesis explores the application of YOLOv5 with transfer learning and fine-tuning in a thermal indoor dataset (the target dataset), which correspond to deep learning strategies that help tackle the lack of data information and reduce training costs by using knowledge learned previously from similar datasets and applying it to a target dataset. In this study, the target dataset is fine-tuned with four pre-training modalities: two RGB datasets (COCO and the RGB format of the target dataset), one thermal dataset (FLIR), as well as the Grayscale format of the target dataset. The results are compared to the training results of Control, which refers to training the target dataset from scratch. The target dataset is fine-tuned in a variety of conditions, including varying learning rates, data augmentation techniques, freezing layers, and SGD and ADAM optimizers. This investigation concluded that using COCO for the pre-training of the model achieves the highest mAP@0.5 independently of its training conditions, surpassing the RGB format of the target dataset and FLIR. This study suggests that it may be because its dimension and diverse information lead to a greater generalization. Although the other parameters have an impact and can help enhance results, the dataset size and the amount of diversity it contains were the variables with the most influence. A deteção de objectos tem uma variedade de aplicações: pode ser usada para auxiliar na agricultura, na deteção de cancro, condução autónoma, robótica ou a identificar intrusos numa propriedade privada. A deteção de objetos tem tido um sucesso significativo com imagens RGB. Contudo, a deteção nestes dados é prejudicada em condições de fraca luminosidade e exige um uso substancial de memória. Uma solução consiste no uso de imagens térmicas, uma vez que requerem menos espaço e são mais adaptáveis a ambientes de luminosidade variável. Deste modo, esta tese explora a aplicação de YOLOv5 num dataset térmico de ambiente interior - o dataset-alvo - com transferência de conhecimento e fine-tuning que correspondem a duas estratégias de aprendizagem profunda que se debruçam sobre a falta de dados e redução do tempo de treino, usando conhecimento aprendido anteriormente em datasets similares e que serão posteriormente aplicados num dataset-alvo. Neste estudo, um dataset-alvo é sujeito a fine-tune com quatro modalidades pré-treinadas: dois datasets RGB (COCO e a versão RGB do dataset-alvo), um dataset térmico (FLIR) e, por fim, a versão cinzenta do dataset-alvo. Os resultados foram comparados a um treino Controlo que se refere aos resultados de treino do dataset-alvo. Os fine-tunes do dataset-alvo são submetidos a várias condições: diferentes taxas de aprendizagem, técnicas de augmentação de dados, congelamento de camadas e uso de diferentes optimizadores (ADAM e SGD). Esta investigação concluiu que pré-treino do COCO atingiu o maior valor de mAP@0.5 independentemente das condições de treino, ultrapassando o formato RGB do dataset-alvo e o FLIR. Este estudo sugere que isto pode acontecer porque a sua dimensão e informação diversificada levam a uma maior generalização. Apesar de outros parâmetros terem impacto e ajudarem a melhorar os resultados, o tamanho do dataset e a diversidade que contém foram as variáveis com a maior influência.
Descrição:	Trabalho de Projeto do Mestrado em Engenharia Biomédica apresentado à Faculdade de Ciências e Tecnologia
URI:	https://hdl.handle.net/10316/110532
Direitos:	openAccess
Aparece nas coleções:	UC - Dissertações de Mestrado