kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
PRODIS - a speech database and a phoneme-based language model for the study of predictability effects in Polish
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0001-5953-7310
Adam Mickiewicz University, Poznań, Poland.
Adam Mickiewicz University, Poznań, Poland.ORCID iD: 0000-0003-0172-7853
2024 (English)In: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, European Language Resources Association (ELRA) , 2024, p. 13068-13073Conference paper, Published paper (Refereed)
Abstract [en]

We present a speech database and a phoneme-level language model of Polish. The database and model are designed for the analysis of prosodic and discourse factors and their impact on acoustic parameters in interaction with predictability effects. The database is also the first large, publicly available Polish speech corpus of excellent acoustic quality that can be used for phonetic analysis and training of multi-speaker speech technology systems. The speech in the database is processed in a pipeline that achieves a 90% degree of automation. It incorporates state-of-the-art, freely available tools enabling database expansion or adaptation to additional languages.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA) , 2024. p. 13068-13073
Keywords [en]
database, language model, Polish, probabilistic effects, surprisal
National Category
Natural Language Processing Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-348780Scopus ID: 2-s2.0-85195946964OAI: oai:DiVA.org:kth-348780DiVA, id: diva2:1878690
Conference
Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Hybrid, Torino, Italy, May 20 2024 - May 25 2024
Note

Part of ISBN 9782493814104

QC 20240701

Available from: 2024-06-27 Created: 2024-06-27 Last updated: 2025-02-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Malisz, Zofia

Search in DiVA

By author/editor
Malisz, ZofiaKul, Małgorzata
By organisation
Speech, Music and Hearing, TMH
Natural Language ProcessingComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 39 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf