A nativeness classifier for TED Talks
2011 (English)Conference paper (Refereed)
This paper presents a nativeness classifier for English. The detector was developed and tested with TED Talks collected from the web, where the major non-native cues are in terms of segmental aspects and prosody. The first experiments were made using only acoustic features, with Gaussian supervectors for training a classifier based on support vector machines. These experiments resulted in an equal error rate of 13.11%. The following experiments based on prosodic features alone did not yield good results. However, a fused system, combining acoustic and prosodic cues, achieved an equal error rate of 10.58%. A small human benchmark was conducted, showing an inter-rater agreement of 0.88. This value is also very close to the agreement value between humans and the best fused system.
Place, publisher, year, edition, pages
IEEE conference proceedings, 2011. 5672-5675 p.
Non-native accent; pronunciation
IdentifiersURN: urn:nbn:se:kth:diva-204007DOI: 10.1109/ICASSP.2011.5947647ScopusID: 2-s2.0-80051638158OAI: oai:DiVA.org:kth-204007DiVA: diva2:1083728
2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
QC 201704102017-03-222017-03-222017-04-10Bibliographically approved