ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Automatic construction of verb valency patterns for Slovene

Kristjan Voje (2018) Automatic construction of verb valency patterns for Slovene. EngD thesis.

[img]
Preview
PDF
Download (438Kb)

    Abstract

    Natural language processing greatly depends on a sufficient amount of training data. When handling with smaller datasets, we can enrich our data by analyzing the semantic structure of the language. In our thesis, we will be working with valency. Valency carries information about the meaning of a sentence. While valency is usually a feature of verbs, we can also observe it in adjectives and nouns. Valency forms valency patterns around carriers. In theory, each sense of the valency carrier should form a distinguishable valency pattern. Valency patterns have a small feature space and are fit for training machine learning algorithms. They contain enough information to distinguish the sense of the valency carrier. Our work is based on corpus ssj500k 2.1. Over half of the corpus contains hand-annotated semantic roles from which we extracted valency patterns. We built a program for listing and analyzing the valency patterns. In theory, different verb senses form different valency patterns. We tested a number of clustering algorithms on the corpus sentences. The goal was to cluster the valency frames, based on similar senses, and to find sense specific valency patterns. We implemented three versions of Lesk algorithm and two versions of k-means algorithm. We used data from SloWNet and SSKJ for the knowledge based Lesk algorithms.

    Item Type: Thesis (EngD thesis)
    Keywords: valency frame, valency, verb
    Number of Pages: 43
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Marko Robnik Šikonja276Mentor
    doc. dr. Apolonija GantarComentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1538103235)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 4336
    Date Deposited: 07 Jan 2019 10:42
    Last Modified: 21 Jan 2019 10:39
    URI: http://eprints.fri.uni-lj.si/id/eprint/4336

    Actions (login required)

    View Item