This paper focuses on the transcription of performance expression and in particular, legato slurs for solo violin performance. This can be used to improve automatic music transcription and enrich the resulting notations with expression markings. We review past work in expression detection, and find that while legato detection has been explored its transcription has not. We propose a method for demarcating the beginning and ending of slurs in a performance by combining pitch and onset information produced by ScoreCloud (a music notation software with transcription capabilities) with articulated onsets detected by a convolutional neural network. To train this system, we build a dataset of solo bowed violin performance featuring three different musicians playing several exercises and tunes. We test the resulting method on a small collection of recordings of the same excerpt of music performed by five different musicians. We find that this signal-based method works well in cases where the acoustic conditions do not interfere largely with the onset strengths. Further work will explore data augmentation for making the articulation detection more robust, as well as an end-to-end solution.
Part of ISBN 9789152773727
QC 20230525