This study presents a database of controlled speech material as well as spontaneous Swedish conversation produced in modal and whispered voice. The database includes facial expression and head movement features tracked by a non-invasive and unobtrusive system. We analyse differences between the voice conditions in the visual domain paying particular attention to realisations of prosodic structure, namely, prominence patterns. Analysis results show that prominent vowels in whisper are expressed with a statistically significantly a) larger jaw opening, b) stronger lip rounding and protrusion, c) higher eyebrow raising and d) higher pitch angle velocity of the head, relative to modal speech.
QC 20201020