kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline
Université de Toulon, Aix Marseille University, CNRS, LIS, Toulon, France.
Escuela de Biologıía & Centro de Investigación en Neurociencias, Universidad de Costa Rica, San Pedro, Costa Rica.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland.ORCID iD: 0000-0002-6739-0838
National Museum of Natural Sciences, Spanish National Research Council (CSIC), Madrid, Spain; Centre de Recherche sur la Biodiversité et l’Environnement (UMR 5300 CNRS-IRD-TINPT-UPS), Université Paul Sabatier, Toulouse, France; Facultad de Ciencias, Universidad Autónoma de Madrid, Madrid, Spain.
Show others and affiliations
2025 (English)In: Bioacoustics, ISSN 0952-4622, E-ISSN 2165-0586, Vol. 34, no 4, p. 419-446Article in journal (Refereed) Published
Abstract [en]

The fundamental frequency (F0) is a key parameter for characterising structures in vertebrate vocalisations, for instance defining vocal repertoires and their variations at different biological scales (e.g. population dialects, individual signatures). However, the task is too laborious to perform manually, and its automation is complex. Despite significant advancements in the fields of speech and music for automatic F0 estimation, similar progress in bioacoustics has been limited. To address this gap, we compile and publish a benchmark dataset of over 250,000 calls from 14 taxa, each paired with ground truth F0 values. These vocalisations range from infra-sounds to ultra-sounds, from high to low harmonicity, and some include non-linear phenomena. Testing different algorithms on these signals, we demonstrate the potential of neural networks for F0 estimation, even for taxa not seen in training, or when trained without labels. Also, to inform on the applicability of algorithms to analyse signals, we propose spectral measurements of F0 quality which correlate well with performance. While current performance results are not satisfying for all studied taxa, they suggest that deep learning could bring a more generic and reliable bioacoustic F0 tracker, helping the community to analyse vocalisations via their F0 contours.

Place, publisher, year, edition, pages
Informa UK Limited , 2025. Vol. 34, no 4, p. 419-446
Keywords [en]
cross-species dataset, deep learning, Fundamental frequency (F0), vocalisation analysis
National Category
Artificial Intelligence
Identifiers
URN: urn:nbn:se:kth:diva-366189DOI: 10.1080/09524622.2025.2500380ISI: 001501315800001Scopus ID: 2-s2.0-105007437974OAI: oai:DiVA.org:kth-366189DiVA, id: diva2:1981650
Note

QC 20250704

Available from: 2025-07-04 Created: 2025-07-04 Last updated: 2025-08-15Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Ekström, Axel G.

Search in DiVA

By author/editor
Ekström, Axel G.
By organisation
Speech, Music and Hearing, TMH
In the same journal
Bioacoustics
Artificial Intelligence

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 102 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf