Marko Kuder (2010) Extraction of Predominant Melody from Audio Recordings. EngD thesis.
Abstract
Audio melody extraction is a problem that still presents itself as not easily soluble on each annual MIREX competition. Algorithms developed for this purpose try to establish a track of melody (frequency of predominant tone at each moment) in songs and determine, whether the melody is even present. The results of competitions show that none of these two problems is completely soluble, since the algorithms make errors even on songs easily understandable by humans. In this thesis I describe my implementation of the PreFEst algorithm, developed by Masataka Goto from 1999 to 2004. It is based on a promising approach that was very competitive at the time, but hasn't been developed further by the author. In this paper I propose my own implementation of the algorithm (without Goto's version of tracking) with several possible improvements - voicing detection, alternate spectrogram calculation with an additional level in the multi-rate filter bank and an optional combination of multiple window sizes, iterative tracking of peaks, outlier elimination, hypothesis balancing with the use of best successor evaluation, transition recognition using the Hough transform and adaptation of hypothesis to inter-frequency-bin values. I have tested my expanded version of Goto's algorithm on the ISMIR 2004 competition database and MIREX 2005 learning set. I have compared my results with other algorithms from previous competitions in audio melody extraction. I have established the effect of using different improvements and determined possible weaknesses and strengths of this algorithm by analysing several hypotheses calculated on test data.
Actions (login required)