Kristjan Voje (2018) Automatic construction of verb valency patterns for Slovene. EngD thesis.
Abstract
Natural language processing greatly depends on a sufficient amount of training data. When handling with smaller datasets, we can enrich our data by analyzing the semantic structure of the language. In our thesis, we will be working with valency. Valency carries information about the meaning of a sentence. While valency is usually a feature of verbs, we can also observe it in adjectives and nouns. Valency forms valency patterns around carriers. In theory, each sense of the valency carrier should form a distinguishable valency pattern. Valency patterns have a small feature space and are fit for training machine learning algorithms. They contain enough information to distinguish the sense of the valency carrier. Our work is based on corpus ssj500k 2.1. Over half of the corpus contains hand-annotated semantic roles from which we extracted valency patterns. We built a program for listing and analyzing the valency patterns. In theory, different verb senses form different valency patterns. We tested a number of clustering algorithms on the corpus sentences. The goal was to cluster the valency frames, based on similar senses, and to find sense specific valency patterns. We implemented three versions of Lesk algorithm and two versions of k-means algorithm. We used data from SloWNet and SSKJ for the knowledge based Lesk algorithms.
Actions (login required)