Compositional hierarchical model for music information retrieval

Matevž Pesek (2018) Compositional hierarchical model for music information retrieval. PhD thesis.

Preview

Abstract

In recent years, deep architectures, most commonly based on neural networks, have advanced the state of the art in many research areas. Due to the popularity and the success of deep neural-networks, other deep architectures, including compositional models, have been put aside from mainstream research. This dissertation presents the compositional hierarchical model as a novel deep architecture for music processing. Our main motivation was to develop and explore an alternative non-neural deep architecture for music processing which would be transparent, meaning that the encoded knowledge would be interpretable, trained in an unsupervised manner and on small datasets, and useful as a feature extractor for classification tasks, as well as a transparent model for unsupervised pattern discovery. We base our work on compositional models, as compositionality is inherent in music. The proposed compositional hierarchical model learns a multi-layer hierarchical representation of the analyzed music signals in an unsupervised manner. It provides transparent insights into the learned concepts and their structure. It can be used as a feature extractor—its output can be used for classification tasks using existing machine learning techniques. Moreover, the model’s transparency enables an interpretation of the learned concepts, so the model can be used for analysis (exploration of the learned hierarchy) or discovery-oriented (inferring the hierarchy) tasks, which is difficult with most neural network based architectures. The proposed model uses relative coding of the learned concepts, which eliminates the need for large annotated training datasets that are essential in deep architectures with a large number of parameters. Relative coding contributes to slim models, which are fast to execute and have low memory requirements. The model also incorporates several biologically-inspired mechanisms that are modeled according to the mechanisms that exists at the lower levels of human perception (e.g. lateral inhibition in the human ear) and that significantly affect perception. The proposed model is evaluated on several music information retrieval tasks and its results are compared to the current state of the art. The dissertation is structured as follows. In the first chapter we present the motivation for the development of the new model. In the second chapter we elaborate on the related work in music information retrieval and review other compositional and transparent models. Chapter three introduces a thorough description of the proposed model. The model structure, its learning and inference methods are explained, as well as the incorporated biologically-inspired mechanisms. The model is then applied to several different music domains, which are divided according to the type of input data. In this we follow the timeline of the development and the implementation of the model. In chapter four, we present the model’s application to audio recordings, specifically for two tasks: automatic chord estimation and multiple fundamental frequency estimation. In chapter five, we present the model’s application to symbolic music representations. We concentrate on pattern discovery, emphasizing the model’s ability to tackle such problems. We also evaluate the model as a feature generator for tune family classification. Finally, in chapter six, we show the latest progress in developing the model for representing rhythm and show that it exhibits a high degree of robustness in extracting high-level rhythmic structures from music signals. We conclude the dissertation by summarizing our work and the results, elaborating on forthcoming work in the development of the model and its future applications.

Item Type:

Thesis (PhD thesis)

Keywords:

music information retrieval, deep learning architectures, automated chord estimation, multiple fundamental frequency estimation, pattern discovery, rhythm modeling

Number of Pages:

132

Language of Content:

English

Mentor / Comentors:

Name and Surname	ID	Function
izr. prof. dr. Matija Marolt	271	Mentor
prof. dr. Aleš Leonardis	29	Comentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537957571)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

4257

Date Deposited:

26 Sep 2018 17:22

Last Modified:

08 Oct 2018 08:45

URI:

http://eprints.fri.uni-lj.si/id/eprint/4257

Actions (login required)

View Item