Please use this identifier to cite or link to this item: https://hdl.handle.net/10316/103726
DC FieldValueLanguage
dc.contributor.authorAssunção, Gustavo-
dc.contributor.authorGonçalves, Nuno-
dc.contributor.authorMenezes, Paulo-
dc.date.accessioned2022-11-23T11:48:12Z-
dc.date.available2022-11-23T11:48:12Z-
dc.date.issued2020-02-28-
dc.identifier.issn2076-3417pt
dc.identifier.urihttps://hdl.handle.net/10316/103726-
dc.description.abstractHuman beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.pt
dc.language.isoengpt
dc.publisherMDPI AGpt
dc.relationFCT - scholarhip 2020.05620.BDpt
dc.relationUIDB/00048/2020pt
dc.rightsopenAccesspt
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt
dc.subjectartificial neural networkspt
dc.subjectmulti-modal perceptionpt
dc.subjecthuman–robot interactionpt
dc.titleBio-Inspired Modality Fusion for Active Speaker Detectionpt
dc.typearticle-
degois.publication.firstPage3397pt
degois.publication.issue8pt
degois.publication.titleApplied Sciences (Switzerland)pt
dc.peerreviewedyespt
dc.identifier.doi10.3390/app11083397pt
degois.publication.volume11pt
dc.date.embargo2020-02-28*
uc.date.periodoEmbargo0pt
item.fulltextCom Texto completo-
item.grantfulltextopen-
item.languageiso639-1en-
item.cerifentitytypePublications-
item.openairetypearticle-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
crisitem.project.grantnoINSTITUTE OF SYSTEMS AND ROBOTICS - ISR - COIMBRA-
crisitem.author.researchunitISR - Institute of Systems and Robotics-
crisitem.author.parentresearchunitUniversity of Coimbra-
crisitem.author.orcid0000-0003-4015-4111-
crisitem.author.orcid0000-0002-1854-049X-
crisitem.author.orcid0000-0002-4903-3554-
Appears in Collections:FCTUC Eng.Electrotécnica - Artigos em Revistas Internacionais
I&D ISR - Artigos em Revistas Internacionais
Files in This Item:
Show simple item record

Google ScholarTM

Check

Altmetric

Altmetric


This item is licensed under a Creative Commons License Creative Commons