Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
NAAQA: A Neural Architecture for Acoustic Question Answering
NECOTIS Dept. of Electrical and Computer Engineering, Sherbrooke University, Canada.ORCID-id: 0000-0002-7931-5966
NECOTIS Dept. of Electrical and Computer Engineering, Sherbrooke University, Canada.ORCID-id: 0000-0002-9306-426X
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH. Department of Electronic Systems, Norwegian University of Science and Technology, Norway.ORCID-id: 0000-0002-3323-5311
2023 (engelsk)Inngår i: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 45, nr 4, s. 4997-5009Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The goal of the Acoustic Question Answering (AQA) task is to answer a free-form text question about the content of an acoustic scene. It was inspired by the Visual Question Answering (VQA) task. In this paper, based on the previously introduced CLEAR dataset, we propose a new benchmark for AQA, namely CLEAR2, that emphasizes the specific challenges of acoustic inputs. These include handling of variable duration scenes, and scenes built with elementary sounds that differ between training and test set. We also introduce NAAQA, a neural architecture that leverages specific properties of acoustic inputs. The use of 1D convolutions in time and frequency to process 2D spectro-temporal representations of acoustic content shows promising results and enables reductions in model complexity. We show that time coordinate maps augment temporal localization capabilities which enhance performance of the network by ∼17 percentage points. On the other hand, frequency coordinate maps have little influence on this task. NAAQA achieves 79.5% of accuracy on the AQA task with ∼four times fewer parameters than the previously explored VQA model. We evaluate the performance of NAAQA on an independent data set reconstructed from DAQA. We also test the addition of a MALiMo module in our model on both CLEAR2 and DAQA. We provide a detailed analysis of the results for the different question types. We release the code to produce CLEAR2 as well as NAAQA to foster research in this newly emerging machine learning task.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE) , 2023. Vol. 45, nr 4, s. 4997-5009
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-324766DOI: 10.1109/tpami.2022.3194311ISI: 000947840300064PubMedID: 36121954Scopus ID: 2-s2.0-85139450848OAI: oai:DiVA.org:kth-324766DiVA, id: diva2:1743465
Prosjekter
IGLU
Merknad

QC 20250611

Tilgjengelig fra: 2023-03-15 Laget: 2023-03-15 Sist oppdatert: 2025-06-11bibliografisk kontrollert

Open Access i DiVA

fulltext(1863 kB)362 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 1863 kBChecksum SHA-512
b674917ed46a4f66c665f1387c7b96483a18c157503eee127cf541213f04e9e95e875a361a951c5723b433ec52e4e333d6f9f3187c29632dced82ef083c29e1d
Type fulltextMimetype application/pdf

Andre lenker

Forlagets fulltekstPubMedScopus

Person

Salvi, Giampiero

Søk i DiVA

Av forfatter/redaktør
Abdelnour, JeromeRouat, JeanSalvi, Giampiero
Av organisasjonen
I samme tidsskrift
IEEE Transactions on Pattern Analysis and Machine Intelligence

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 362 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

doi
pubmed
urn-nbn

Altmetric

doi
pubmed
urn-nbn
Totalt: 842 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf