kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 21) Show all publications
Best, P., Araya-Salas, M., Ekström, A. G., Freitas, B., Jensen, F. H., Kershenbaum, A., . . . Marxer, R. (2025). Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline. Bioacoustics, 34(4), 419-446
Open this publication in new window or tab >>Bioacoustic fundamental frequency estimation: a cross-species dataset and deep learning baseline
Show others...
2025 (English)In: Bioacoustics, ISSN 0952-4622, E-ISSN 2165-0586, Vol. 34, no 4, p. 419-446Article in journal (Refereed) Published
Abstract [en]

The fundamental frequency (F0) is a key parameter for characterising structures in vertebrate vocalisations, for instance defining vocal repertoires and their variations at different biological scales (e.g. population dialects, individual signatures). However, the task is too laborious to perform manually, and its automation is complex. Despite significant advancements in the fields of speech and music for automatic F0 estimation, similar progress in bioacoustics has been limited. To address this gap, we compile and publish a benchmark dataset of over 250,000 calls from 14 taxa, each paired with ground truth F0 values. These vocalisations range from infra-sounds to ultra-sounds, from high to low harmonicity, and some include non-linear phenomena. Testing different algorithms on these signals, we demonstrate the potential of neural networks for F0 estimation, even for taxa not seen in training, or when trained without labels. Also, to inform on the applicability of algorithms to analyse signals, we propose spectral measurements of F0 quality which correlate well with performance. While current performance results are not satisfying for all studied taxa, they suggest that deep learning could bring a more generic and reliable bioacoustic F0 tracker, helping the community to analyse vocalisations via their F0 contours.

Place, publisher, year, edition, pages
Informa UK Limited, 2025
Keywords
cross-species dataset, deep learning, Fundamental frequency (F0), vocalisation analysis
National Category
Artificial Intelligence
Identifiers
urn:nbn:se:kth:diva-366189 (URN)10.1080/09524622.2025.2500380 (DOI)001501315800001 ()2-s2.0-105007437974 (Scopus ID)
Note

QC 20250704

Available from: 2025-07-04 Created: 2025-07-04 Last updated: 2025-07-04Bibliographically approved
Ekström, A. G. (2024). A Theory That Never Was: Wrong Way to the “Dawn of Speech”. Biolinguistics, 18, Article ID e14285.
Open this publication in new window or tab >>A Theory That Never Was: Wrong Way to the “Dawn of Speech”
2024 (English)In: Biolinguistics, ISSN 1450-3417, Vol. 18, article id e14285Article in journal (Refereed) Published
Abstract [en]

Recent literature argues that a purportedly long-standing theory—so-called “laryngeal descent theory”—in speech evolution has been refuted (Boë et al., 2019, https://doi.org/10.1126/sciadv.aaw3916). However, an investigation into the relevant source material reveals that the theory described has never been a prominent line of thinking in speech-centric sciences. The confusion arises from a fundamental misunderstanding: the argument that the descent of the larynx and the accompanying changes in the hominin vocal tract expanded the range of possible speech sounds for human ancestors (a theory that enjoys wide interdisciplinary support) is mistakenly interpreted as a belief that all speech was impossible without such changes—a notion that was never widely endorsed in relevant literature. This work aims not to stir controversy but to highlight important historical context in the study of speech evolution.

Place, publisher, year, edition, pages
Leibniz Institute for Psychology (ZPID), 2024
Keywords
evolution of speech, miscitation, primatology, speech production, vocal tract
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-346815 (URN)10.5964/bioling.14285 (DOI)001237520500001 ()2-s2.0-85192800645 (Scopus ID)
Note

QC 20240617

Available from: 2024-05-24 Created: 2024-05-24 Last updated: 2024-06-17Bibliographically approved
Ekström, A. G., Gannon, C., Edlund, J., Moran, S. & Lameira, A. R. (2024). Chimpanzee utterances refute purported missing links for novel vocalizations and syllabic speech. Scientific Reports, 14(1), Article ID 17135.
Open this publication in new window or tab >>Chimpanzee utterances refute purported missing links for novel vocalizations and syllabic speech
Show others...
2024 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 14, no 1, article id 17135Article in journal (Refereed) Published
Abstract [en]

Nonhuman great apes have been claimed to be unable to learn human words due to a lack of the necessary neural circuitry. We recovered original footage of two enculturated chimpanzees uttering the word “mama” and subjected recordings to phonetic analysis. Our analyses demonstrate that chimpanzees are capable of syllabic production, achieving consonant-to-vowel phonetic contrasts via the simultaneous recruitment and coupling of voice, jaw and lips. In an online experiment, human listeners naive to the recordings’ origins reliably perceived chimpanzee utterances as syllabic utterances, primarily as “ma-ma”, among foil syllables. Our findings demonstrate that in the absence of direct data-driven examination, great ape vocal production capacities have been underestimated. Chimpanzees possess the neural building blocks necessary for speech.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Phonetics, Primatology, Vocal learning
National Category
Zoology
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-351240 (URN)10.1038/s41598-024-67005-w (DOI)001278002800007 ()39054330 (PubMedID)2-s2.0-85199430867 (Scopus ID)
Funder
Swedish Research Council, 2017-00626KTH Royal Institute of Technology
Note

QC 20240805

Available from: 2024-08-04 Created: 2024-08-04 Last updated: 2024-08-27Bibliographically approved
Ekström, A. G. (2024). Correcting the record: Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934−2022). American Journal of Primatology, 86(8)
Open this publication in new window or tab >>Correcting the record: Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934−2022)
2024 (English)In: American Journal of Primatology, ISSN 0275-2565, E-ISSN 1098-2345, Vol. 86, no 8Article, review/survey (Refereed) Published
Abstract [en]

The phonetic potential of nonhuman primate vocal tracts has been the subject of considerable contention in recent literature. Here, the work of Philip Lieberman(1934−2022) is considered at length, and two research papers—both purported challenges to Lieberman's theoretical work—and a review of Lieberman's scientific legacy are critically examined. I argue that various aspects of Lieberman's research have been consistently misinterpreted in the literature. A paper by Fitch et al. overestimates the would‐be “speech‐ready” capacities of a rhesus macaque, and the data presented nonetheless supports Lieberman's principal position—that nonhuman primates cannot articulate the full extent of human speech sounds. The suggestion that no vocal anatomical evolution was necessary for the evolution of human speech(as spoken by all normally developing humans) is not supported by phonetic or anatomical data. The second challenge, by Boë et al., attributes vowel‐like qualities of baboon calls to articulatory capacities based on audio data; I argue that such“protovocalic” properties likely result from disparate articulatory maneuvers compared to human speakers. A review of Lieberman's scientific legacy by Boë et al. ascribes a view of speech evolution (which the authors term “laryngeal descent theory”) to Lieberman, which contradicts his writings. The present article documents a pattern of incorrect interpretations of Lieberman's theoretical work in recent literature. Finally, the apparent trend of vowel‐like formant dispersions in great ape vocalization literature is discussed with regard to Lieberman's theoretical work. The review concludes that the “Lieberman account” of primate vocal tract phonetic capacities remains supported by research: the ready articulation of fully human speech reflects species‐unique anatomy.

Place, publisher, year, edition, pages
Wiley-Blackwell, 2024
Keywords
Phonetics
National Category
Languages and Literature
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-351239 (URN)10.1002/ajp.23637 (DOI)38741274 (PubMedID)2-s2.0-85192844535 (Scopus ID)
Note

QC 20240805

Available from: 2024-08-04 Created: 2024-08-04 Last updated: 2025-05-27Bibliographically approved
Ekström, A. G. & Edlund, J. (2024). Sketches of chimpanzee (Pan troglodytes) hoo’s: vowels by any other name?. Primates, 65(2), 81-88
Open this publication in new window or tab >>Sketches of chimpanzee (Pan troglodytes) hoo’s: vowels by any other name?
2024 (English)In: Primates, ISSN 0032-8332, E-ISSN 1610-7365, Vol. 65, no 2, p. 81-88Article in journal (Refereed) Published
Abstract [en]

In human speech, the close back rounded vowel /u/ (the vowel in “boot”) is articulated with the tongue arched toward the dorsal boundary of the hard palate, with the pharyngeal cavity open. Acoustic and perceptual properties of chimpanzee (Pan troglodytes) hoo’s are similar to those of the human vowel /u/. However, the vocal tract morphology of chimpanzees likely limits their phonetic capabilities, so that it is unlikely, or even impossible, that their articulation is comparable to that of a human. To determine how qualities of the vowel /u/ may be achieved given the chimpanzee vocal tract, we calculated transfer functions of the vocal tract area for tube models of vocal tract configurations in which vocal tract length, length and area of a laryngeal air sac simulacrum, length of lip protrusion, and area of lip opening were systematically varied. The method described is principally acoustic; we make no claim as to the actual shape of the chimpanzee vocal tract during call production. Nonetheless, we demonstrate that it may be possible to achieve the acoustic and perceptual qualities of back vowels without a reconfigured human vocal tract. The results, while tentative, suggest that the production of hoo’s by chimpanzees, while achieving comparable vowel-like qualities to the human /u/, may involve articulatory gestures that are beyond the range of the human articulators. The purpose of this study was to (1) stimulate further simulation research on great ape articulation, and (2) show that apparently vowel-like phenomena in nature are not necessarily indicative of evolutionary continuity per se.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Articulatory phonetics, Primatology, Speech acoustics, Vowel quality
National Category
Comparative Language Studies and Linguistics Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-367104 (URN)10.1007/s10329-023-01107-3 (DOI)001126347800001 ()38110671 (PubMedID)2-s2.0-85180178929 (Scopus ID)
Note

QC 20250715

Available from: 2025-07-15 Created: 2025-07-15 Last updated: 2025-07-15Bibliographically approved
Ekström, A. G., Madison, G., Olsson, E. J. & Tsapos, M. (2024). The search query filter bubble: effect of user ideology on political leaning of search results through query selection. Information, Communication and Society, 27(5), 878-894
Open this publication in new window or tab >>The search query filter bubble: effect of user ideology on political leaning of search results through query selection
2024 (English)In: Information, Communication and Society, ISSN 1369-118X, E-ISSN 1468-4462, Vol. 27, no 5, p. 878-894Article in journal (Refereed) Published
Abstract [en]

It is commonly assumed that personalization technologies used by Google for the purpose of tailoring search results for individual users create filter bubbles, which reinforce users’ political views. Surprisingly, empirical evidence for a personalization-induced filter bubble has not been forthcoming. Here, we investigate whether filter bubbles may result instead from a searcher’s choice of search queries. In the first experiment, participants rated the left-right leaning of 48 queries (search strings), 6 for each of 8 topics (abortion, benefits, climate change, sex equality, immigration, nuclear family, Islam, and taxation). An independent sample of participants were then asked to select one of these queries for each of the 8 topics. With the exception of the topic of Islam, participants were significantly more likely to select a query corresponding to their own political leaning, compared to other queries, explaining between 12% and 39% of the variance. A second experiment investigated the effect of the political leaning of the same queries on the overall political leaning of Search Engine Result Pages (SERPs) in Google Search. The top six results of each SERP were rated collectively by a third group of participants, explaining 36.3% of the variance across all 48 search terms (p <.00001). That is, (1) participants in our experiments tended to select own-side search queries, and (2) using those queries tended to yield own-side search results when using the Google search engine. Our results are consistent with the notion of a self-imposed filter bubble in which query selection plays a salient role.

Place, publisher, year, edition, pages
Informa UK Limited, 2024
Keywords
Filter bubble, Google, ideology, online search, political leaning, search query
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-350092 (URN)10.1080/1369118X.2023.2230242 (DOI)001021400700001 ()2-s2.0-85164525870 (Scopus ID)
Note

QC 20240807

Available from: 2024-08-07 Created: 2024-08-07 Last updated: 2024-08-07Bibliographically approved
Feindt, K., Rossi, M., Esfandiari-Baiat, G., Ekström, A. G. & Zellers, M. (2023). Cues to next-speaker projection in conversational Swedish: Evidence from reaction times. In: Interspeech 2023: . Paper presented at 24th International Speech Communication Association, Interspeech 2023, Dublin, Ireland, Aug 20 2023 - Aug 24 2023 (pp. 1040-1044). International Speech Communication Association
Open this publication in new window or tab >>Cues to next-speaker projection in conversational Swedish: Evidence from reaction times
Show others...
2023 (English)In: Interspeech 2023, International Speech Communication Association , 2023, p. 1040-1044Conference paper, Published paper (Refereed)
Abstract [en]

We present first results of a study investigating the salience and typicality of prosodic markers in Swedish at turn ends for turn-yielding and turn-keeping purposes. We performed an experiment where participants (N=32) were presented with conversational chunks and, after the audio ended, were asked to determine which of two speakers would speak next by clicking a picture on a screen. Audio stimuli were manipulated by (i) raising and (ii) lowering f0 over the last 500 ms of a turn, (iii) speeding up or (iv) slowing down duration over the last 500 ms, and (v) raising and (vi) lowering the last pitch peak. In our data, out of all manipulations, increasing the speech rate was found to be the most disruptive (p < .005). Higher speech rate led to longer reaction times in turn-keeping, which were shorter in turn-yielding. Other manipulations did not significantly alter reaction times. The results presented here may be complemented with eye movement data, to further elucidate cognitive mechanisms underlying turn-taking behavior.

Place, publisher, year, edition, pages
International Speech Communication Association, 2023
Keywords
conversational dynamics, gaze, paralinguistics, prosody, Swedish, turn-taking
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-337870 (URN)10.21437/Interspeech.2023-778 (DOI)001186650301039 ()2-s2.0-85171547649 (Scopus ID)
Conference
24th International Speech Communication Association, Interspeech 2023, Dublin, Ireland, Aug 20 2023 - Aug 24 2023
Note

QC 20241011

Available from: 2023-10-10 Created: 2023-10-10 Last updated: 2024-10-11Bibliographically approved
Ekström, A. G. & Edlund, J. (2023). Evolution of the human tongue and emergence of speech biomechanics. Frontiers in Psychology, 14, Article ID 1150778.
Open this publication in new window or tab >>Evolution of the human tongue and emergence of speech biomechanics
2023 (English)In: Frontiers in Psychology, E-ISSN 1664-1078, Vol. 14, article id 1150778Article, review/survey (Refereed) Published
Abstract [en]

The tongue is one of the organs most central to human speech. Here, the evolution and species-unique properties of the human tongue is traced, via reference to the apparent articulatory behavior of extant non-human great apes, and fossil findings from early hominids - from a point of view of articulatory phonetics, the science of human speech production. Increased lingual flexibility provided the possibility of mapping of articulatory targets, possibly via exaptation of manual-gestural mapping capacities evident in extant great apes. The emergence of the human-specific tongue, its properties, and morphology were crucial to the evolution of human articulate speech.

Place, publisher, year, edition, pages
Frontiers Media SA, 2023
Keywords
evolution of speech, speech articulation, human evolution, speech production, primatology, articulatory phonetics, coarticulation, speech motor control
National Category
Other Medical Sciences not elsewhere specified
Identifiers
urn:nbn:se:kth:diva-330517 (URN)10.3389/fpsyg.2023.1150778 (DOI)001004893900001 ()37325743 (PubMedID)2-s2.0-85162047256 (Scopus ID)
Note

QC 20230630

Available from: 2023-06-30 Created: 2023-06-30 Last updated: 2023-06-30Bibliographically approved
Ekström, A. G. (2023). Predicting linguistic universality through reverse engineering. Nature Reviews Psychology, 2(10), 587
Open this publication in new window or tab >>Predicting linguistic universality through reverse engineering
2023 (English)In: Nature Reviews Psychology, E-ISSN 2731-0574, Vol. 2, no 10, p. 587-Article in journal (Refereed) Published
Place, publisher, year, edition, pages
Springer Nature, 2023
National Category
Psychology
Identifiers
urn:nbn:se:kth:diva-338654 (URN)10.1038/s44159-023-00228-2 (DOI)001156831800001 ()2-s2.0-85167835050 (Scopus ID)
Note

QC 20231023

Available from: 2023-10-23 Created: 2023-10-23 Last updated: 2024-03-05Bibliographically approved
Ekström, A. G., Moran, S., Sundberg, J. & Lameira, A. R. (2023). PREQUEL: SUPERVISED PHONETIC APPROACHES TO ANALYSES OF GREAT APE QUASI-VOWELS. In: ICPhS 2023: . Paper presented at ICPhS 2023,August 7-11,Prague, Czech Republic.
Open this publication in new window or tab >>PREQUEL: SUPERVISED PHONETIC APPROACHES TO ANALYSES OF GREAT APE QUASI-VOWELS
2023 (English)In: ICPhS 2023, 2023Conference paper, Published paper (Refereed)
Abstract [en]

 There is renewed interest in potential vowel production by nonhuman primates, but no agreedupon methodologies for its estimation from reallife vocalizations. Here, we present a set of supervised approaches for estimating primate vowel-like articulation, with reference to orangutan long call pulses (N=36). We summarize our approach as a cohesive framework, the Primate Quasi-Vowel (PREQUEL) protocol. We (1) estimated f0 from correlograms, (2) and vocal tract resonances (formants) from spectrograms, (3) the results of which were then compared against synthesized vowels for those frequency values; and (4) presented to uninformed listeners (N=16), who largely agreed on the categorization of vowel-like qualities for vocalizations (Cronbach’s alpha=.701). We also provide descriptions of methods that are seemingly inadequate for formant estimation in great ape calls. We argue that a combination of phonetic methods is required to develop a science of nonhuman primate articulation.

National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-351247 (URN)
Conference
ICPhS 2023,August 7-11,Prague, Czech Republic
Note

QC 20240805

Available from: 2024-08-04 Created: 2024-08-04 Last updated: 2024-08-05Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6739-0838

Search in DiVA

Show all publications