kth.sePublikationer KTH
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
NAAQA: A Neural Architecture for Acoustic Question Answering
NECOTIS Dept. of Electrical and Computer Engineering, Sherbrooke University, Canada.ORCID-id: 0000-0002-7931-5966
NECOTIS Dept. of Electrical and Computer Engineering, Sherbrooke University, Canada.ORCID-id: 0000-0002-9306-426X
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH. Department of Electronic Systems, Norwegian University of Science and Technology, Norway.ORCID-id: 0000-0002-3323-5311
2023 (Engelska)Ingår i: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 45, nr 4, s. 4997-5009Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The goal of the Acoustic Question Answering (AQA) task is to answer a free-form text question about the content of an acoustic scene. It was inspired by the Visual Question Answering (VQA) task. In this paper, based on the previously introduced CLEAR dataset, we propose a new benchmark for AQA, namely CLEAR2, that emphasizes the specific challenges of acoustic inputs. These include handling of variable duration scenes, and scenes built with elementary sounds that differ between training and test set. We also introduce NAAQA, a neural architecture that leverages specific properties of acoustic inputs. The use of 1D convolutions in time and frequency to process 2D spectro-temporal representations of acoustic content shows promising results and enables reductions in model complexity. We show that time coordinate maps augment temporal localization capabilities which enhance performance of the network by ∼17 percentage points. On the other hand, frequency coordinate maps have little influence on this task. NAAQA achieves 79.5% of accuracy on the AQA task with ∼four times fewer parameters than the previously explored VQA model. We evaluate the performance of NAAQA on an independent data set reconstructed from DAQA. We also test the addition of a MALiMo module in our model on both CLEAR2 and DAQA. We provide a detailed analysis of the results for the different question types. We release the code to produce CLEAR2 as well as NAAQA to foster research in this newly emerging machine learning task.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE) , 2023. Vol. 45, nr 4, s. 4997-5009
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-324766DOI: 10.1109/tpami.2022.3194311ISI: 000947840300064PubMedID: 36121954Scopus ID: 2-s2.0-85139450848OAI: oai:DiVA.org:kth-324766DiVA, id: diva2:1743465
Projekt
IGLU
Anmärkning

QC 20250611

Tillgänglig från: 2023-03-15 Skapad: 2023-03-15 Senast uppdaterad: 2025-06-11Bibliografiskt granskad

Open Access i DiVA

fulltext(1863 kB)362 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 1863 kBChecksumma SHA-512
b674917ed46a4f66c665f1387c7b96483a18c157503eee127cf541213f04e9e95e875a361a951c5723b433ec52e4e333d6f9f3187c29632dced82ef083c29e1d
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextPubMedScopus

Person

Salvi, Giampiero

Sök vidare i DiVA

Av författaren/redaktören
Abdelnour, JeromeRouat, JeanSalvi, Giampiero
Av organisationen
Tal, musik och hörsel, TMH
I samma tidskrift
IEEE Transactions on Pattern Analysis and Machine Intelligence
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 362 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
pubmed
urn-nbn

Altmetricpoäng

doi
pubmed
urn-nbn
Totalt: 842 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf