kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Audiopedia: Audio QA with Knowledge
Indian Institute of Technology, Jodhpur.ORCID iD: 0000-0003-3646-8492
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0002-7414-845X
Indian Institute of Technology, Jodhpur.
2025 (English)In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2025, Hyderabad India, 2025Conference paper, Published paper (Refereed)
Abstract [en]

Audiopedia is introduced (with 3 subtasks, s-AQA, m-AQA and r-AQA), a novel Audio QA task, requiring audio comprehension and external knowledge reasoning. Additionally, a framework that combines Audio Entity Linking (AEL) and a Knowledge-Augmented Audio Multimodal Model (KA2LM) is proposed to enhance large audio language models for knowledge-intensive tasks.

Place, publisher, year, edition, pages
Hyderabad India, 2025.
Keywords [en]
audio question answering, knowledge-intensive questions, audio entity linking
National Category
Computer Systems
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-358060OAI: oai:DiVA.org:kth-358060DiVA, id: diva2:1924388
Conference
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2025, Hyderabad, India, April 06-11 2025
Available from: 2025-01-05 Created: 2025-01-05 Last updated: 2025-01-14Bibliographically approved

Open Access in DiVA

Audiopedia: Audio QA with Knowledge(634 kB)106 downloads
File information
File name FULLTEXT01.pdfFile size 634 kBChecksum SHA-512
81d2c42fc56e0f66557f7ecb2a06befa615e61a54ffc75147131ecacd44fd82e5bdd8b8bfd0e6fcafb2b26f3b82f3b3009baa5e359de19a68b47cef760ed4cf7
Type fulltextMimetype application/pdf

Authority records

Chhatre, Kiran

Search in DiVA

By author/editor
Penamakuri, Abhirama SubramanyamChhatre, Kiran
By organisation
Computational Science and Technology (CST)
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 106 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 498 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf