ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Modelling words co-occurrence with machine learning

Ruben Sipoš (2009) Modelling words co-occurrence with machine learning. EngD thesis.

[img] PDF
Download (1854Kb)

    Abstract

    Advances in machine learning and increasing computing power are providing new possibilities for data processing and knowledge acquisition. One of the key questions in automatic text analysis is how to acquire semantic information. A possible approach is to model semantics using word co-occurrence. In the context of this work we have developed an approach which enables us to build models, represented as triples consisting of subject, predicate and object, based on word n-grams. We used Google n-grams constructed on the basis of their index of web pages. Special attention was also given to how to efficiently process this quantity of data, because it is one of the largest datasets of this type. Also, we provide justification for choosing representation using triples and describe how to efficiently compute triples because current approaches are time consuming We propose a new procedure for construction of models of word co-occurrences. Each model represents a set of triples using more general concepts. We also give the results of evaluation, which indicate the potential usefulness of our results. We conclude with some interesting ideas for further research.

    Item Type: Thesis (EngD thesis)
    Keywords: machine learning, word n-grams, triples, triple extraction, modeling word co-occurrences, concept abstraction (using background knowledge)
    Number of Pages: 50
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    doc. dr. Janez Demšar257Mentor
    doc. dr. Dunja MladenićComentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=7141972)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 863
    Date Deposited: 09 Jun 2009 11:47
    Last Modified: 13 Aug 2011 00:35
    URI: http://eprints.fri.uni-lj.si/id/eprint/863

    Actions (login required)

    View Item