Discovering clusters of related learning tasks for improving prediction models of individual tasks

Tadej Janež (2013) Discovering clusters of related learning tasks for improving prediction models of individual tasks. PhD thesis.

Preview

PDF
Download (5Mb)

Abstract

Multi-task learning (MTL) is a machine learning paradigm concerned with concurrent learning of models for multiple related learning tasks. In the context of MTL, related learning tasks signify tasks from a common problem domain (e.g. sharing the same attribute space).This enables leveraging of the domain-specific information contained in the data of the related learning tasks when building a model for a particular task. Most of the preceding MTL approaches assume that all learning tasks are related and suitable for joint learning. However, merging the data of two unrelated learning tasks may result in worse performance for both tasks. Thus, the question on how to eliminate this assumption and only selectively merge the data of different learning tasks, in order to prevent unrelated tasks influence each other, represents a big challenge in the area of MTL. The doctoral thesis presents a new MTL method, named Error-reduction merging (ERM), that belongs to the group of newer MTL approaches which try to eliminate the assumption that all tasks are related. ERM automatically discovers clusters of related learning tasks (solely from their data) and merges the data of learning tasks belonging to the same cluster, which leads to improvement of prediction models of individual learning tasks. For defining relatedness, ERM uses the assumption that two tasks are related if the model, built on merged data, decreases the prediction error compared to models built on separate data of each learning task. The number of clusters does not need be the given, ERM determines it automatically. Another advantage of ERM compared to the majority of other MTL approaches is that it is not tied to a particular base learner. Rather, any supervised learning method can be used inside ERM. Next, we define three properties of an MTL problem which are important from the perspective of automatic discovery of clusters of related learning tasks for improvement of prediction models of individual learning tasks: the number of learning tasks, the numbers of examples of individual learning tasks, and the number of clusters of different types of learning tasks. Then we perform an extensive experimental study of the behavior of ERM with respect to the previously defined properties of an MTL problem and, in addition, with respect to the degree of noise in the data and the base learner used inside ERM. Results on synthetic MTL problems of learning Boolean functions demonstrate that the success of ERM increases with the increased number of learning tasks and with the increased numbers of examples of individual learning tasks. The difference in the area under ROC curve (AUC) between ERM and two other approaches, one without merging (NoMerging) and the other with all tasks merged together (MergeAll), rapidly increases. At the same time, ERM's AUC quickly catches up with the learning, where cluster membership is given in advance (Oracle). Increasing the number of clusters of different types of learning tasks and increasing the degree of noise in the data tends to decrease the success of the ERM. In the first case, ERM's AUC values remain bigger than AUCs of methods NoMerging and MergeAll, however, the difference to AUC values of the Oracle increases. In the second case, the AUC values of ERM remain relatively close to the Oracle for smaller degrees of noise. With the increasing degree of noise, however, they quickly decrease. For the largest degree of noise in the data, ERM completely falls short and performs similarly or worse than the MergeAll method. Experiments, where we used different base learners (support vector machines, decision tree, k-nearest neighbors) inside ERM showed that its success is, in general, independent of the base learner used inside. Results on real classification and regression MTL problems confirmed that ERM also performs well on such practical problems. The findings of a study on how to speed up the learning of an autonomous robotic agent by performing experiments in a more complex environment are in complete accordance with the findings of ERM's behavior on synthetic MTL problems. We end the doctoral thesis with an interpretation of the binarization of attribute values when building decision trees in the sense of ERM. The quantitative comparison revealed the superiority of ERM. Interestingly, the qualitative comparison showed that there is no similarity between the structure of the trees, where branches are split along attribute id (identifier of learning tasks), and dendrograms depicting the history of merging of the learning tasks with ERM.

Item Type:

Thesis (PhD thesis)

Keywords:

artificial intelligence, machine learning, multi-task learning, supervised learning, Boolean functions, autonomous learning agents, learning speed, binarization of attribute values, decision trees

Number of Pages:

157

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
akad. prof. dr. Ivan Bratko	77	Mentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=10334804)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

2302

Date Deposited:

13 Dec 2013 15:48

Last Modified:

02 Jan 2014 15:15

URI:

http://eprints.fri.uni-lj.si/id/eprint/2302

Actions (login required)

View Item