Ciril Bohak (2016) Finding the most representative part of vocal folksongs with transcription and segmentation. PhD thesis.
Abstract
The goal of musical segmentation is to develop algorithms that will find similar patterns in audio signal according to desired aspect (melody, rhythm, timbre) and to define the boundaries between the repetitions. The goal of musical transcription is to develop algorithms that will extract pitches from the audio signal in every time frame either for monophonic or polyphonic music. Music segmentation and transcription represent two very important parts of music information retrieval research field. The results can be used in many real-life applications: with music segmentation we can define musical structure, melodic repetitions in music or we can use it in search for most representative part; transcription results can be used in automatic generation of scores, as a support in manual transcription process or in search of similar melodies in musical collections. In the presented dissertation we are addressing specific problems of musical segmentation and transcription of audio recordings: segmentation and transcription of folk music audio recordings. Currently developed methods fail on folk music due to it's specifics, such as bad recording conditions and amateur performers, which are the reason for high level of noise in recordings, inaccurate singing, pitch drifting throughout the song etc. In introduction section we give the motivation for conducting the research and define the problems and goals of the thesis in the detail. The first part of the dissertation presents the research from field of music segmentation, where we present a folk music segmentation method, that outperforms current state-of-the-art methods on a collection of folk music. The presented segmentation method bases on a probabilistic model for finding melodically repeating parts in recording and defining their beginnings. The method was evaluated on a folk music collection of different types: solo singing, two- and three-voiced singing, choir songs, instrumental songs and mixed assembles. The developed method was also evaluated according to robustness aspect, where resistance to different degradations was tested and evaluated. The second part of the dissertation addresses musical transcription, where we present a folk music transcription method. The method uses the segmentation results to find a representative part of a song and transcribes it with use of all the repetitions within the song. The method takes multiple fundamental frequencies estimations calculated with an existing method and song segmentation. With use of segmentation results the method aligns the multiple fundamental frequencies estimations in temporal and frequency domain, removes local inaccuracies and joins the transcriptions of all repeating parts. In next stage the method calculates notes using two-level probabilistic model based on explicit duration Hidden Markov models, used to model notes, rests and note transitions. The presented method was evaluated on collection of polyphonic folk music, where it returns better results of current state-of-the-art music transcription methods. In the conclusions we highlight the scientific contributions of the thesis and give the directions for possible future improvements and extensions of the method.
Actions (login required)